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Certain trends in the investigations of the intelligence of rural 
and mountain children raise questions of vital importance to education. 
Does the deviation of these children from the normal distribution 
indicate that they are inherently inferior? Are they by training 
and experience made Jess capable of dealing with intelligence tests 
than children in other environments? To what‘extent do intelligence 
ratings vary as a result of improving environmental conditions? Is 
the decrease in IQ with an increase in chronological age due to defects 
in the process of maturation of intelligence, or to the increasing 
influence of poor cultural conditions? 

Although investigators agree that variations do occur in the IQ, 
there is much controversy concerning the causes of these discrepancies. 
The Thirty-ninth Yearbook of the National Society for the Study of 
Education, sequential to the T'wenty-seventh Yearbook, reviews the 
investigations and presents the nature-nurture discussion from various 
points of view. All the studies but one indicate that rural children 
make lower scores on intelligence tests than city children; the Scottish 
Council for Educational Research’ reports that no difference in intelli- 
gence is found between the rural and urban children of Scotland, and 
offers the explanation that, “nowhere has scholastic opportunity 
been more evenly equated than in Scotland—99.7 per cent of Scottish 
teachers are fully trained.”* Most investigators feel that environ- 
mental differences influence the lower ratings of rural children. 


INTELLIGENCE OF MOUNTAIN CHILDREN 


In 1930 we made a study of the intelligence of East Tennessee 
Mountain children.'? The Dearborn IA and IIC Intelligence Tests 


*See 6, Part 11, p. 273. 
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were given to eleven hundred forty-seven children in Grades I-VIII 
from twenty-one mountain schools, and the Illinois Intelligence Test 
was given to five hundred sixty-four of these cases in Grades III-VIII, 
The median IQ was 82 on the Dearborn and 78 on the Illinois Test, 
The IQ on the Dearborn Test was 95 at age six, and decreased to 74 
at age sixteen. A marked school retardation was evident, from one- 
and-a-half years in the first grade to over two years in the eighth 
grade. The conclusions reached were: (1) The results of both tests 
were materially affected by environmental factors, (2) the mountain 
children were not as far below normal intelligence as the tests indicated, 
and (3) with proper environmental changes the mountain children 
might test near a normal group. It was noted that, 


The growing educational opportunities in the mountains are materially 
changing the isolated sections. The State is providing modern and adequate 
schools in the very heart of the mountains, and is sending well-trained teachers, 
many of whom are holding or working toward college degrees, into those 
schools to teach the mountain children. . . . Educational opportunities of 
the mountains have advanced with the improvement of roads, thus enabling 
consolidation of schools in a number of sections. As this is only a recent 
development, it will be interesting to note the influence of better schools on 
the results of later intelligence test data on the same groups of children.* 


Ten years have elapsed since this initial study was made, and we 
have retested the same mountain areas. The data for this second 
investigation were gathered during the Spring and Fall of 1940. 
Obviously we could not retest the same children, but we have repeated 
the same test on children in the same areas and largely from the same 
families; ninety-one per cent of the families represented in this study 
have been life-residents of the area, eight per cent have moved into 
the areas since 1930 from adjacent Appalachian Mountain sections, 
leaving only one per cent shifting into the mountains from undeter- 
mined areas. ‘The overlapping of a majority of the family names in 
the two studies agrees with this general trend, and the data indicate 
that any major changes found in the results of the intelligence tests 
are due to other factors than population shift. 


SOME ENVIRONMENTAL CHANGES IN THE MOUNTAIN AREA 


During the past decade there have been many changes in the 
economic, social and cultural life of these mountain people. The State 
has completed an excellent road system which gives every community 





* See 12, p. 354. 
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access to progressive areas outside of the mountains, and has developed 
transportation facilities for schools and industry. Our data show 
about sixty per cent of the families in one county and forty per cent 
in another had one or more members working in industrial plants. 
In 1930 neither State nor county provided transportation; in 1940, 
two thousand three hundred sixteen children were transported daily 
toand from school. This probably accounts for much of the seventeen 
per cent increase in enrollment in 1940, and for the thirty-two per 
cent higher average daily attendance. Basing the allotment of State 
money to county schools on the basis of average daily attendance now 
stimulates the teachers and community to keep the children in school. 
Hot lunches are served regularly in all the larger schools. 

There has been a general shift from the one-room to larger schools, 
a reorganization made possible by improved roads. An improvement 
is also indicated in the types of school buildings. Many schools now 
have adequate playgrounds and fairly well equiped gymnasiums. A 
circulating library, maintained by the State and counties, makes 
available around fourteen thousand volumes for these schools, and 
free textbooks are furnished for the first three grades. While a decade 
ago the average training of the teachers was less than two years of 
college work, today it is about three years. A majority of the teachers 
are either college graduates or receiving training-in-service from 
accredited teacher-training colleges. New teachers employed are 
required to have four years of college training. Well-trained, pro- 
gressive college graduates have displaced the politically appointed 
county superintendents of a decade ago. An excellent supervisory 
program is provided for the area with well-trained county supervisors 
and a State regional supervisor who assists in coérdinating instruction. 
Schools have been improved by the innovation of a State rating system 
based on points for improved instruction, additional books and mate- 
rials, provision for health facilities and general equipment. 

During the past ten years the rapid growth of industry in the area 
enables the families to supplement its agricultural livelihood with 
ready cash through employment in the rayon, lumber, pottery and 
other industrial plants. Farming methods have materially changed; 
pasture lands now replace many of the corn fields on the rough moun- 
tain slopes, and stock raising and dairy farming is proving profitable. 
Small but modern frame houses located on or near the main highways 
have replaced many of the log cabins and small rough-board houses. 
There has been unusual development in the area, and the improvement 
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in roads, schools, agriculture and the economic life of the communities 
has materially changed the general environment of these people. 

In the 1930 investigation nine hundred forty-six children were 
tested from twenty-one different schools. In 1940 all the children 
in these schools, and an additional two thousand cases in nineteen 
other mountain schools, were tested in order to.increase the statistical 
reliability of the study. Comparisons were made between the original 
and additional schools, and, when no significant differences were found 
between the distribution of intelligence in the two groups, the data 
were combined for subsequent treatment. Apparently any significant 
changes in IQ are not due to the additional cases. The median IQ 
for the original twenty-one schools is 87.6 + .34, and for the additional 
schools, 87.2 + .32. The administration and scoring of the tests 
was under the same supervision as in 1930. 

The average mountain child is eight months younger for his grade 
than ten years ago, as shown in Table I and Fig. 1. The differences 
in chronological age range from three months in Grade I to fifteen 
months in Grade V. There is a consistent difference in favor of the 
1940 group in each grade and a significant difference in most of the 
grades, substantiating other investigations which indicate that age- 
grade retardation decreases with improvement in instruction and 
general educational opportunities. There seems to be a tendency 
for the older children to leave the elementary school earlier than they 
did ten years ago, probably due to better opportunities for high-school 
attendance and industrial employment. The degree of age-grade 
retardation in the Tennessee mountains is practically the same as that 
reported by Edwards and Jones* for Georgia mountains. Sherman 
and Key® found a larger amount among Virginia mountain chil- 
dren, and the Appalachian areas in general appear to have a much 
greater problem of retardation than the Iowa rural schools studied 


by Baldwin. 
INCREASE IN MENTAL AGE DURING LAST DECADE 


Table I and Fig. 1 show a comparison of the median mental ages. 
An average mental age for all grades shows the 1940 group has gained 
about nine months over the 1930 group, or nearly one mental month 
a year for ten years. In other words, the average mountain child in 
1940 is three-fourths of a year mentally superior for his grade than 
the average mountain child in 1930 was. The greatest differences 
between the two groups occur in the seventh and eighth grades, and 
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TaBLE I.—ComPaRISON OF MENTAL AGEs OF MOUNTAIN CHILDREN ACCORDING 


to GRADES AND CHRONOLOGICAL AGES 








aah Median Median Median 
CA MA MA 
Grade I Six-year-olds 
adeunees 7-5 6-3 ES Ee 6-1 
Re 7-2 6-10 De wtheessdn eee 6-8 
Difference... .. 0-3 0-7 Difference.......... 0-7 
Grade II Seven-year-olds 
cae 606 ta 9-2 7-6 RE ee RRS 6-10 
ES 8-5 8-1 De eiedass <csens 7-6 
Difference 0-9 0-7 Difference.......... 0-8 
Grade III Eight-year-olds 
re 10-2 8-2 ECS SE aa 7-7 
Ee 9-3 8-7 SE Res ae 8-6 
Difference... .. 0-11 0-5 Difference.......... 0-11 
Grade IV Nine-year-olds 
Sc 60-0'dané 11-4 8-11 Si ss ka coed 8-2 
Pe 10-7 9-8 0 eres - 9-2 
Difference... .. 0-9 0-9 Difference.......... 1-0 
Grade V Ten-year-olds 
a 12-7 9-8 3 8-10 
ie lee tees 11-4 10-4 a ey 9-7 
Difference... .. 1-3 0-8 Difference.......... 0-9 
Grade VI Eleven-year-olds 
13-1 10-7 dS die tO eaiat 9-3 
12-5 11-5 ECA Sarees 10-9 
Difference... .. 0-8 0-10 Difference.......... 1-6 
Grade VII Twelve-year-olds 
cs we asa 14-6 11-4 1930..... YS Rees 10-2 
13-7 12-7 et lds awl 11-3 
Difference... .. 0-11 1-3 Difference.......... 1-1 
Grade VIII Thirteen-year-olds 
Se 15-0 12-3 Oe 6 a, Wa's 0.0 10-7 
Te 14-4 13-4 eh Vink oa eee 11-11 
Difference... . . 0-8 1-1 Difference.......... 1-4 
Fourteen-year-olds 
Ee Te. se obs cea 10-11 
a aN ca Hills 12-4 
Difference......... 1-5 
Fifteen-year-olds 
BN 306 sc pct oa deus 11-4 
Rnd. acy ch ike ona 12-7 
Difference.......... 1-3 
Sixteen-year-old 
SE, eames aeean ee 12-2 
Difference.......... 1-1 
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the smallest in Grade III. Figure 1 shows that mental age increases 
fairly consistently from grade to grade, but falls below the chrono- 
logical age level. The 1930 group was definitely older chronologically 
but younger mentally than the 1940 group; the difference between 
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Fie. 1.—Comparison of chronological and mental ages of mountain children according 
to grade, 1930 and 1940 groups. 


MA and CA of the 1940 group is about a third of that found ten years 
ago. ‘There is still a tendency for the increments of mental growth 
to decrease with an increase in chronological age. During the first 
and second grades mental ages of the 1940 group lacks only four months 
of normality.* This increases to eight months in Grade III, eleven 





* Median CA, Table IV, less median MA, Table V. 
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TaBLE II.—ComPaRISON OF IQ’s oF MounTAIN CHILDREN ACCORDING TO GRADES 


























No. | Median 

cases | 10 PE Q-1 Q-3 Q Range 
Grade I 
1930............} 115 | 84.10 | 1.14] 74.91 94.42; 9.76 | 45tol1l5 
1940............} 871 | 95.14 .57 | 86.08 | 103.73 | 8.83 | 50 to 165 
Difference....... Pe 11.04 | 1.27 
Grade II 
ee ise 87 | 85.40 | 1.04 | 77.08 | 92.66) 7.79 | 45 to 125 
1940............] 282 | 95.66 .74 | 84.69 | 104.58 | 9.95 | 50 to 166 
Difference....... ... | 10.26 | 1.27 
Grade III 
1930............]| 103 | 83.96 | 1.27 | 72.92 | 93.54 | 10.32 | 45 to 150 
1940............| 500 | 92.84 .64 | 80.67 | 103.62 | 11.48 | 35 to 160 
Difference....... 8.88 | 1.42 
Grade IV 
(ae 172 | 81.50 | 1.18 | 69.30 | 94.17 | 12.44 | 45 to 135 
1940............]| 491 | 92.38 .71 | 79.83 | 104.83 | 12.50 | 45 to 160 
Difference....... cece | OE Boe 
Grade V 
1930............] 187 | 76.10 | 1.13 | 67.58 | 88.75 | 10.59 | 50 to 125 
1040............| 45 | 01.36 .74 | 78.66 | 104.19 | 12.77 | 50 to 160 
Difference....... ..-- | 15.26 | 1.35 
Grade VI 
SA: 117 | 81.90 | 1.17 | 73.63 | 93.84 | 10.12 | 55 to 125 
EEE: 458 | 92.41 .68 | 80.66 | 103.78 | 11.56 | 50 to 160 
Difference....... css | oe | eae 
Grade VII 
1930............]| 128 | 79.50 | 1.00 | 73.23 | 91.36 | 9.07 | 50 to 130 
1940............] 360 | 92.63 .75 | 81.75 | 104.44 | 11.35 | 50 to 150 
Difference....... sos | ar. oo 
Grade VIII 
ceo e che 0 87 | 84.80 | 1.13 | 73.25 | 90.14 | 8.45 | 55 to 120 
1940............| 8325 | 9.23 .76 | 82.78 | 104.80 | 11.01 | 40 to 150 
Difference......... ad 8.43 | 1.36 
All Grades 
1930............] 946 | 82.40 .40 | 72.70 | 92.62 | 9.96 | 45 to 150 
1940............] 3262 | 92.22 .25 | 81.47 | 104.22 | 11.38 | 35 to 166 
Difference....... eee kt ae 15 








months in Grade IV, and twelve months in Grades V-VIII. While 
the 1940 group is perfectly normal at ages six, seven and eight,* the 
mental age falls below the chronological age in Grades I, II and III 
as a result of overageness and age-grade retardation. Beginning 





* CA of 6-6, 7-6, 8-6, represents normal age for Grades I, II, III, etc. 
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Taste III.—Comparison or IQ’s or MountTAIn CHILDREN ACCORDING To CA 
No. | Median 

Six-year-olds 

ad Kiwis wn ik 33 | 94.68 | 2.03 | 89.06 | 107.75 | 9.34 | 75 to 115 

Re. dca oes 188 /|102.56 .64 | 95.34 | 109.38 | 7.02 | 75 to 165 

Difference....... 7.88 | 2.12 

Seven-year-olds 

cs ita shoes 62 | 90.90 |} 1.38 | 81.25 98.61 | 8.68 | 55 to 125 

es kia oe al 244 | 99.85 .66 | 91.38 | 107.77 | 8.19 | 65 to 160 

Difference....... 8.95 | 1.54 

Eight-year-olds 

No ois Sakk' 4 ecb 60 | 88.88 | 1.09 | 82.33 | 95.90; 6.78 | 40 to 110 

ee 322 | 99.18 .70 | 90.50 | 110.71 | 10.10 | 60 to 160 

Difference....... 10.30 | 1.29 

Nine-year-olds 

NE ioe ain ae 94 | 86.38 | 1.79 | 79.95 | 95.22 | 7.72 | 60 to 145 

ek SS a ed are 324 | 96.44 .74 | 85.73 | 107.07 | 10.67 | 55 to 160 

Difference....... 10.06 | 1.93 

Ten-year-olds 

SS 99 | 84.25 | 1.64 | 74.29 | 94.75 | 10.23 | 50 to 125 

is stabs cane 383 | 91.44 .73 | 81.33 | 104.04 | 11.36 | 45 to 160 

Difference....... 7.19 | 1.79 

Eleven-year-olds 

ee 102 | 80.00 | 1.49 | 70.19 | 94.32 | 12.06 | 50 to 130 

1940............] 358 | 93.87 .88 | 80.36 | 106.95 | 13.30 | 50 to 150 

Difference....... 13.87 | 1.73 

Twelve-year-olds 

aia its 0 xvas 107 | 81.41 | 1.06 | 74.25 | 91.88 | 8.81 | 50 to 135 

a 365 | 90.17 .80 | 75.67 | 100.13 | 12.23 | 35 to 150 

Difference....... 8.76 | 1.33 

Thirteen-year-olds 

BS aNd «uae 109 | 77.61 | 1.22 | 66.56 | 86.97 | 10.21 | 45 to 120 

bd alee 319 | 87.75 .87 | 75.30 | 100.04 | 12.37 | 50 to 145 

Difference....... 10.14 | 1.48 

Fourteen-year-olds 

ES iain wines 125 | 74.72 | 1.09 | 63.39 | 82.80! 9.71 | 45to1l15 

PGA os ues 00d 257 | 85.06 .75 | 74.05 | 93.33 | 9.64 | 50 to 125 

Difference....... 10.34 | 1.32 

Fifteen-year-olds 

saan a with 61 | 73.44 | 1.39 | 65.56 | 82.93 | 8.71 | 59to 95 

ERS: 116 | 81.33 | 1.13 | 73.00 | 92.50} 9.75 | 50 to 110 

Difference....... 7.89 | 1.79 

Sixteen-year-olds 

ONS 29 | 73.50 | 2.41 | 64.06 | 84.81 | 10.37 | 45to 95 

re 34 | 80.00 | 2.08 | 68.12 | 87.50| 9.69 | 40 to 110 

Difference....... 6.50 | 3.18 
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at age nine, the 1940 mental age falls from four months below normal 
to twenty-five months at fourteen years. The average difference 
between MA and normal CA is about a third as great as it was in 
1930. These trends are shown also in the comparisons in Table III. 

The distributions of IQ’s for the 1930 and 1940 groups are shown in 
Table II. The median IQ for the 1930 group, 82 + .40, has increased 
over ten points to 93 + .25in 1940. This gain is further shown by a 
study of the percentage of overlapping: Seventy-four per cent or about 
three-fourths of the 1930 cases are below the 1940 median. In 1930 
the median IQ classified the children as a dull group, while in 1940 
the group is within the normal classification. The wider range in 
1940 is largely due to the increased number of cases. 

Since 1930 there has been a noticable IQ gain in all types of schools. 
In 1930 there was a greater tendency for the IQ to increase with the 
size of school. This trend was also shown in Baldwin’s study,” where 
Iowa farm children in one-room schools had a median IQ of 91.7, 
against 99.4 in the consolidated schools. The fact that there is less 
difference in the IQ among different types of schools in Tennessee 
mountain areas than in rural Iowa may indicate there is more uni- 
formity of instruction and educational facilities in the mountain areas; 
perhaps the one-room schools are not so poor nor the larger schools 
as good as the consolidated schools in Iowa. Greater uniformity 
in instructional practices in 1940 may also be a factor in decreasing 
the differences among the various types of schools in the mountain 
area. 

Table III shows the median IQ of the 1940 group is consistently 
higher at all ages. The differences are statistically significant at 
all ages except sixteen, where the limited number of cases tends to 
increase the PE’s. Similar trends are seen in comparing the ranges 
of the two groups, and the first and third quartiles. Both groups 
show a decline in IQ with increasing chronological age. 


OVER-AGENESS AND DECLINE IN IQ 


The study of mountain children in 1930 showed a consistent 
decrease in IQ with increase in chronological age from 94.7 at age six 
to 73.5 at age sixteen, a decline of 1.9 points a year. The study in 
1940 shows a similar decline from 102.6 at age six to 80 at age sixteen, 
an average of two points each year. The total decrease in IQ in 1930 
was 21.3 points, and in 1940, 22.6 points. In 1930 the children were 
within the normal intelligence group at age six; the 1940 data definitely 
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TaBLE IV.—SHOWING THE EFFECT OF OVER-AGENESS ON THE DISTRIBUTION oF 
1Q’s 
Per cent | Median 
of cases IQ Range 
Grade I 
in 6nd dee wih Vaebbnle x wkie es amadvn we 51.3 102.56 | 75 to 165 
a i i a a! 30.8 92.82 | 65 to 120 
el es ache hes eeke 17.9 90.04 | 50 to 115 
ET cc ccke clued at 66 nee bu Poe be 6% wd mee 100.0 95.14 | 50 to 165 
Grade II 
LL ao 2. ow ice whe whewese <adnee 44.6 104.00 | 75 to 166 
I 6a 20 ilar n ob Wea. dis hare e wie $40 sear 30.2 94.00 | 60 to 150 
ee oe 25.2 88.98 | 50 to 140 
ete ik nce ad kanaed Cebu nk os a bk 100.0 95.66 | 50 to 166 
Grade III 
ITT RE a 40.5 | 103.28 | 75 to 160 
Gd cin ns 5 aes\edau act eeees sxecnee 25.7 91.79 | 65 to 155 
ee a dace based 33.8 83.45 | 35 to 130 
cea e els kd kk a obeek chee Kaw cet 100.0 92.84 | 35 to 160 
Grade IV 
Seid Sek cecke ch acd as oike 64 34.7 104.13 | 55 to 160 
ee ewok wed de 8aec ov 08% OX 26.1 90.33 | 45 to 135 
EE Pr ea re 39.2 82.68 | 45 to 135 
AS sok chile bs sektin > 66's 6 04s was <b 100.0 92.38 | 45 to 160 
Grade V 
Ns teh Sk Oe ee le ke swears 35.6 102.17 | 70 to 160 
re Led wana b.6'e 4 ob es hes kh 25.8 92.91 | 50 to 130 
EON, 5 unico ebwcbeceebecesecees 38.6 79.00 | 50 to 125 
Gs O55 tik ed ad een eeR be Ona aae'e 6 Komik 100.0 91.36 | 50 to 160 
Grade VI 
dik as a a aia Re ad ois a 32.8 | 103.44 | 60 to 160 
EE ack is Cd aca AGN keen oe +e es 31.3 93.09 | 65 to 130 
es oe ce cara bews #4 0 0 063 35.0 80.70 | 50 to 120 
Ne dne dd 46 Obs RAE Ewes Beted ee 100.0 92.41 | 50 to 160 
Grade VII 
EE Ee es oe ee 30.0 | 105.00 | 65 to 150 
ns ost buch bodWuakhibweivese chew’ 31.2 93.18 | 60 to 145 
EE PLO ELLE EE 38.8 79.80 | 50 to 130 
Ps. aS th che bie ae ees aes wee oe 100.0 92.63 | 50 to 150 
Grade VIII 
I a ov 5s ras Wee hee Pbk a bab oscese 27.0 | 101.00 | 60 to 150 
14-year-olds........... b eta aM ise 0x 32.2 90.17 | 60 to 125 
CS PSE E IAS OTE 40.8 88.52 | 40 to 115 
EAs ahh) oe Laie ah w ee ha wt 100.0 93.23 | 40 to 150 











> 
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TaBLE V.—SHOWING THE EFFECT OF RETARDATION ON THE DECREASE IN IQ 
witH INcREASING CA 








No. Median 
cases IQ PE 

Seven-year-olds 

1. Retarded cases eliminated................... 124 104.00 | 1.00 

6 oid ind ce arash Ol 6 een b oke ea 244 99.85 . 66 

Ce es pean nt 4.15 1.19 
Eight-year-olds 

1. Retarded cases eliminated................... 178 103.28 .83 

NG, ick ca oe fin Re auan bade eee cea 322 99.18 .70 

ee eee eS lee vel la a weiss 4.10 1.08 
Nine-year-olds 

1. Retarded cases eliminated................... 148 104.13 .98 

EE ro... 42 os 5 Ge CERN A OC eke s ote KS eu 324 96.44 .74 

I. : ¢ 4 dis ah oteesaudde se aes eewtel nee 7.69 1.22 
Ten-year-olds 

1. Retarded cases eliminated................... 146 102.17 1.30 

EE Se SERRE DS, hit eee > > Soeym aan ar 383 91.44 .73 

RRR Ae eS NSPE i ip Re oa ha pen 10.73 1.49 
Eleven-year-olds 

1. Retarded cases eliminated................... 128 103 .44 1.21 

EEE NTT ey Ty POE RE ee ER 358 93 . 87 .88 

i ee oleate a" 9.57 1.49 
Twelve-year-olds 

1. Retarded cases eliminated................... 94 105.00 1.31 

ee Cl ih an ola sadedereléenece 365 90.17 .80 

a t. -. au ce addotlccdhardsskavsabeoes Sas 14.83 1.53 
Thirteen-year-olds 

1. Retarded cases eliminated................... 76 101.C0 | 1.47 

I i ee als ot dates beh 319 87.75 .87 

cn Sood ee gh oe thee ew elie wanes ee fe 13.25 1.70 














show that the children are normal at age six. Some investigators 
explain this decline in IQ and the low average intelligence of mountain 
children as due to a poor heredity, caused by inbreeding and the 
superior families leaving the mountains for better economic and 
educational opportunities in other sections. Others base their expla- 
nations on the theory that the mind develops with stimulation, main- 
taining that, since the rural environment is less stimulating, there 
occurs a general decline in the rate of mental growth as the children 
become older chronologically. It has also been suggested that the 
relative placement of the items of tests standardized on urban chil- 
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dren does not adequately measure the development of the rural child, 
especially in the older age-levels. 

Over-ageness and age-grade retardation among rural and mountain 
children has been observed by various investigators, and presents a 
serious problem in the rural schools of Tennessee.’* In order to study 
the influence of this factor on the decline of the IQ, in Table IV all 
children who were over-age one and two-or-more years were eliminated 
from each grade.* In Grade I nearly half the children are over-age, 
and the retardation increases to seventy-three per cent in Grade VIII. 
For the normal-age groups the IQ remains normal at each grade level. 
The over-age group definitely lowers the total average for each grade. 
The effect of this age-grade retardation on the median IQ is shown in 
Table V. The retarded cases lower the IQ from four points at age 
seven and eight to 14.83 + 1.53 points at age twelve. As over-ageness 
increases, and retardations accumulate from grade to grade, there 
occurs a corresponding decline in IQ, indicating that age-grade retarda- 
tion causes the median IQ to decline with increasing chronological 
age. The IQ decreases with an increase in the amount of age-grade 
retardation. 

The decline of the IQ with increase in chronological age is the same 
as it was a decade ago except that it is on a higher IQ level. Over a 
period of ten years the general level of intelligence of these mountain 
children has been raised ten IQ points. We have shown that although 
this investigation throws further light on the data and interpretations 
presented ten years ago, more research with other groups in other 
areas is needed before reaching any final conclusions as to the relative 
influence of nature and nurture on IQ changes. A check study should 
also be made to determine whether children in the general population 
score higher today on the Dearborn Tests than they did a decade 
ago. However, the general trends found in these mountain studies 
and in other investigations of rural children present a challenge for 
education: Large environmental changes appear toinfluencetheIQ. In 
contrast to other social philosophies, our democratic ideals depend upon 
the opportunities each child has for developing his individual abilities. 


SUMMARY 


(1) There is a general agreement among investigators that urban 
children rate higher on intelligence tests than rural or mountain 


children. 





* Normal age for Grade I, 6 to 7; Grade II, 7 to 8; etc. 
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(2) The majority of studies indicate a decrease in IQ with an 
increase in chronological age. 

(3) There are diverse opinions concerning the factors which cause 
rural-urban differences and the decline in IQ. 

(4) During the Spring and Fall of 1940, intelligence tests were 
given to three thousand two hundred fifty-two children in forty 
mountain schools of East Tennessee, and the results are compared 
with a similar study made ten years ago. 

(5) During the decade there has been definite improvement in 
the economic, social and educational status of this mountain area. 

(6) Today the average mountain child is about eight months 
younger chronologically and nine months older mentally for his grade 
than the average child of ten years ago. 

(7) The difference between the chronological and mental age of 
the average mountain child is now about one-third as great as it was 
a decade ago. 

(8) The 1940 group of mountain children is mentally superior 
to the 1930 group at all ages and all grades, as measured by the same 
tests. 

(9) The average mountain child has gained ten points in IQ, or 
nearly one point a year during the past ten years. 

(10) The average mountain child’s IQ decreases about two points 
each year from age six to sixteen. This is about the same rate of 
decline as was found ten years ago. 

(11) Over-ageness, or age-grade retardation, among mountain chil- 
dren appears to be the predominating cause of the decline in IQ with 
increase in chronological age. 

(12) The results of this investigation gives further light on the 
findings of the 1930 study, and indicate that intelligence, as measured 
by these tests, may be improved with an improvement in educational 
and general environmental conditions. 
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ON DIFFERENT FORMS OF LEARNING BY READING! 


GEORGE KATONA 
New York 


The primary aim of this investigation is to contribute to the analysis 
of that very common process of acquiring knowledge which may be 
called “‘learning by reading.’”’ The reader seeks to learn about a sub- 
ject-matter by reading a meaningful text once or a few times—without 
the intention of learning the text by heart—certain parts slowly, 
attentively and repeatedly, other parts quickly, in a superficial way and 
just once, all according to his individual preference. By limiting the 
investigation to the study of reading for the sake of understanding, we 
exclude from it the study of rote learning as well as the reading of 
unintelligible texts. Moreover, we defer the study of the effects on the 
learning process of variations in the interest, attention, motivation, or 
the age, personality, intelligence, and prior knowledge of the learner. 
Suppose that several interested mature persons of similar intelligence 
study texts on an unfamiliar subject-matter with the aim of acquiring 
knowledge. Then the question arises whether or not, and if so to what 
extent, the learning process will be affected by differences in the texts 
read. 

In the author’s previous investigation on meaningful methods of 
learning @ comparison was made between “mechanical memorizing” 
and ‘‘learning by understanding,’’ and differences in the results of the 
two forms of learning were established.? Different attitudes of the 
learners brought about by different instructions (e.g., “learn by heart 





1 The investigation was carried out during the author’s holding of a Fellowship 
of the John Simon Guggenheim Memorial Foundation. 

* Katona, G.: Organizing and Memorizing, Studies in the Psychology of Learning 
and Teaching. New York, 1940. Since this book discusses the literature on 
different kinds of learning processes, only a few relevant references will be made 
here. H. B. English distinguished two kinds of memory (verbatim and sub- 
stance) by using two types of test following one type of instruction (J. of Gen. 
Psych., Vol. x1, 1934, Vol. xx1, 1939, Psych. Rev., Vol. xiv1, 1939). T.R. McConnell 
(University of Iowa Studies in Education, Vol. rx, 1934) and T. W. Cook (Psych. 
Rev., Vol. x11, 1934) contributed to the study of the differences between intelligent 
learning (achieving insight) and habit formation. Theoretically as well as prac- 
tically important are the studies of W. A. Brownell on ‘Two Kinds of Learning in 
Arithmetic’’ (learning by repetition and learning by insight) in J. of Ed. Res., 
Vol. xxx1, 1938) and on “Learning as Reorganization, an Experimental Study in 
Third Grade Arithmetic” (Duke Research Studies in Education, 1939). 
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the following solution of a problem,” or ‘‘try to understand the follow- 
ing solution of a problem”’) were shown to be one, but not the only, 
means conducive to one or the other method of learning. In addition, 
different kinds of presentation of the material, its more or less appro- 
priate grouping and organization, were responsible for learning by 
understanding or the lack of it.1 The extension of that investigation to 
learning by reading is the more urgent as the question has not been 
studied before whether the distinction between “verbatim learning” 
and “substance learning” (defined simply as non-verbatim learning) 
is the same as that between memorizing and learning by understanding. 
Specifically, we have to ask whether understanding is the characteristic 
feature of all non-verbatim learning by reading. If in some instances 
this were not to be so, we should be compelled to differentiate between 
various forms of substance learning. 

Questions of method provide the initial difficulties for the investiga- 
tion. How should we test the different learning processes and what 
criteria should we use for the presence or absence of understanding? 
The most common way of testing verbal learning, to ask the learner to 
reproduce the text verbatim, may only produce evidence of certain 
memory functions. Therefore, questions on the substance of the text 
and its comprehension (e.g., “tell the essence in your own words’’) will 
be used to a limited extent. Especially when some time has elapsed 
between learning and testing, such questions may help to determine 
what in substance the learner has retained of the text. We shall, 
however, use application questions as the main method of testing the 
results of the learning process. This type of test consists of questions, 
the answer to which had not been taught but can be found by applying 
what had been taught to a new situation. This testing method has 
some advantages over substance questions with reference to quantita- 
tive treatment and its immediate use after the learning process. Fur- 
thermore, the ability to apply a principle intelligently can, as in the 
previous investigations, serve as a criterion for the understanding of the 
principle. 

Postponing the question about the exact definition of application 
questions, we shall, at the outset, present only a scheme of the investi- 
gation, which involves a rough distinction between memory questions 
and application questions. 





1 The concept of appropriate organization is derived from the Gestalt approach 
to psychology as developed especially by M. Wertheimer (cf. Katona, op. cit., 
pp. 15 and 19). 
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First Phase: F oretest—Questions A, B, C, are asked, and the subjects 
are unable to answer them. (In other words, only those persons— 
college students—are included as subjects who do not know how to 
answer these questions before reading the text.) 

Second Phase: Learning Period.—One half of the subjects reads one 
text, the other half another text. How the two texts differ will be dis- 
cussed later. Both texts contain the answers to the questions A, B, C, 
and both do not contain the answers to certain questions X, Y, Z, which 
are related to the texts. 

Third Phase: Test.—Questions A, B, C, as well as X, Y, Z, are asked. 
A requirement of an ideal experiment would be that the readers of both 
texts should be able to answer the questions A, B, C, thereby bringing 
proof that learning has occurred between foretest and test and that 
the efficiency of that learning was the same with both groups. The 
two groups should, however, differ in their ability to answer the ques- 
tions X, Y, Z. This difference in the learning results should be due to 
differences between the texts read and the form of learning adopted 
because of the properties of the texts.' 


THE TEXTS 


In experiments designed to survey a field of investigation, it is per- 
haps permissible and even advisable not to use the method of isolated 
variation of one experimental factor. We shall not begin by analyzing 
one text and vary it in a definite direction. Such a procedure may be 
necessary later in order to determine the factors responsible for the 
differences obtained. But our endeavor is first directed to finding out 
whether such differences as envisaged in the scheme of investigation 
really occur. Therefore, we shall make use of two rather extreme forms 
of presentation, even though they may not be comparable in every 
respect. We shall attempt to formulate one text which should enable 
its readers to answer application questions and a second text which 
should not have that result, although the material for answering those 
questions is given in both texts. 

In searching for the highest grades of applicability, scientific 
theories come to mind. From laws, principles and theories established 





' Asking the questions X, Y, Z, in the foretest is omitted intentionally. The 
analysis of the process of applying revealed that in order to have real application 
questions, a text should not be read from the viewpoint of the unsolved problems 
X, Y, Z. Therefore, these questions should not be mentioned prior to the learn- 
ing process. 
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or proposed in science, new theoretical as well as practical knowledge 
has been derived and facts hitherto unknown, or occurring in the future, 
have been predicted. In order to obtain similar effects in the experi- 
ments, a theory should be presented in the most appropriate way. We 
shall attempt to present in one text a coherent context in which the 
relations and the meaning of the statements, as well as their sequence 
and grouping and the distribution of emphasis, are determined by the 
character of a comprehensive principle. 

Inability to apply what was learned has been shown as a conse- 
quence of mechanical memorization.! Therefore, as the second form of 
presentation a meaningful text will be formulated with a structure simi- 
lar to that of the usual material of rote learning. Data, information, 
and statements of fact will be enumerated one after the other, without 
forming an integrated whole and without stress on their interrelation. 

Three quotations may serve to explain our selection of the features 
to be embodied in the two texts. William James in discussing the 
memory of the theorist says that 


. a rational system or what is called a science . . . is the greatest of labor- 
saving contrivances. It relieves the memory of an immense number of details, 
replacing, as it does, merely contiguous associations by the logical ones of 
identity, similarity, or analogy. If you know a law, you may discharge your 
memory of masses of particular instances, for the law will reproduce them for 
you whenever you require them.? 


K. Koffka in his chapter on “ Facts and Theories” quotes the Latin 
adage ‘‘multum non multa.”” Knowledge of multum is the knowledge of 
a rational system, the interdependence of facts, while multa refers to 
the number of facts known. According to the second meaning of the 
word 


a person who knows twenty items knows ten times as much as the person who 
knows only two items. But in another sense the latter person, if he knows 
those two items in their intrinsic relation, so that they are no longer two but 
one with two parts, knows a great deal more than the former, if he knows 
just twenty items in pure aggregation. 


W. Koehler replies to the maxim, “let us have facts, not theories,” 
by showing that “‘a problem arises in a theoretical context,’’ for “the 





1 Cf. Katona, op. cit., Chapter V, where the problem of transfer of training is 


discussed extensively. 
2 James, W.: Talks to Teachers on Psychology, 1899. New Ed., 1939, p. 126. 


3 Koffka, K.: Principles of Gestalt Psychology. 1935, p. 5. 
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scope of a theory tends to grow beyond the region of its original 
application.””? 

The two kinds of presentation used can be represented schematically 
in the following way: One text, which for the sake of briefness will be 
called Principle Text or Pr.T., has the form 


X (a,b,c,d). 


The items of information a,b,c, and d are united by their being parts of 
the unitary context given by the principle X. All items are presented 
together with their relation to the principle, an attempt being made to 
let the reader experience the ‘‘nexus” of the items. One could plan to 
select 


|b|[d|ecla| 


as the second type of presentation in which the items should follow each 
other without a formulation of their relationship and without regard 
to the appropriateness of their sequence. In using that scheme, how- 
ever, the second text would have lacked materia] included in the first— 
the sentences containing the principle—which may be necessary to 
answer the application questions. Therefore, the following basic 
scheme was selected for the text which we shall call Enumerative Text 
or En. T.: 


|b|d|zlela| 


The principle is presented here as one of the several items enumerated 
in succession. 

Two sets of material were used in the experiments. The first one 
dealt with the economic problem of the relationship between variable 
and fixed costs, and the effects of the rigidity of certain cost items on 
profit fluctuations. That topic was chosen because its implications 
extend to various fields of economics, accounting and investment.’ 
Some of the implications were discussed in the texts, while others were 
not mentioned there but had to be discovered by the subjects in order 





' Koehler, W.: Dynamics in Psychology. 1940, pp. 117 and 125. 

? Among the applications of the principle, that the larger the fixed costs the 
greater the profit fluctuations, should be mentioned the following: Advantages of 
expanding production as compared to starting new production; advantages of 
mass production; accelerated effects of a small decline in sales on business depres- 
sion; changes in marginal and in unit costs; the leverage factor in dealing in 
securities; distribution of risks and chances in bonds and stocks. 
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that they should be able to answer the test questions. In this respect 
there was no difference between the two texts. 

The author has experienced that it was not always easy for business- 
men (security dealers, accountants) to find the answer to certain ques- 
tions of the type used in the tests. The difficulties, however, could be 
dispelled when the fundamental principles were explained clearly. 
Therefore in Pr.T., first a clear picture of the cost relationships and 
their effects was presented under simple conditions, by neglecting a 
great many factors which often conceal the effectiveness of the princi- 
ple. Then further information was presented in its relation to the 
principle. The entire text served to answer the question: Why do the 
profits of American industry fluctuate from year to year to a much 
greater extent than the value of its sales? 

The formulation of En.T. was comparatively easy because nothing 
else had to be done but to copy and abbreviate the statements in text- 
books of economics, accounting and corporation law, referring to the 
problems dealt with. The division into separate disciplines and chap- 
ters, and the order of the statements within each chapter, was preserved 
as found in most textbooks. Thus En.T. was meaningful and did not 
lack organization; it resembled a factual textbook in which definitions, 
classifications, and rules are stated one after the other. 

In making the two texts suitable for intelligent reading by college 
students, the two schemes given above could not be followed in every 
particular. In order to explain the interrelations between its state- 
ments, it was necessary to expand Pr.T. by adding to it short paragraphs 
of a type a; and c;, which could not be included in the other text. 
Similarly, En.T. had to be enlarged by further items not given in Pr.T. 
The additional items were not directly connected with the test questions. 

After the completion of the experiment with economics, the author 
set out to duplicate the experiment with maternal taken from ele- 
mentary physics, in experiments performed at a different locality and 
with different subjects. Both texts served to teach the same parts of 
mechanics. Pr.T. began with an explanation of the principle of inertia, 
which led to the discussion of the concept of force. Velocity, acceler- 
ation, mass, and resistance, appeared within that context. Gravity 
was then introduced as a special type of force and the behavior of falling 
bodies was explained graphically. Finally, the text contained a dis- 
cussion of two forces acting at the same time. All relevant statements 
of Pr.T. were incorporated in En.T., which also contained definitions 
and formulas. En.T. consisted of six sections in alphabetic order and, 
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therefore, did not have a textbook character. Inherently related parts 
were severed from each other and their interrelation was not empha- 
sized. In the physics-En.T. much less use was made of the device 
of embedding certain statements in different contexts than in the 
economics-En.T. 

The two texts on economics contained eighteen hundred words 
each; the two texts on physics, nine hundred words each. To college 
students who had not studied economics or accounting the topic of 
the economic texts was entirely new, while to those who had not studied 
physics the topic of the physics texts was vaguely familiar. The prob- 
lems were such that they interested the students. The experiments 
did not appear as a task undertaken solely in the interest of the 
experimenter, but most students felt that it was worth while to read 


the texts attentively. 


THE TESTS 


All application questions referred to situations which have not been 
directly mentioned in the texts. Thus, the physics application ques- 
tions dealt with the behavior of bodies dropped from moving objects; 
in the texts the joint application of the principles of inertia and gravita- 
tion was not discussed. Similarly, the differences in the operating 
expenses and the investment chances of various types of business enter- 
prises, which were not referred to in the texts, formed the topic of some 
of the economic application questions. The relation of the application 
questions to the long texts was not stated; the subjects had to find 
out whether anything previously studied—and if so, what and how— 
could be applied to the situation presented in the question. 

All the questions had one and only one correct answer. It was 
endeavored to exclude questions with doubtful answers. Several ques- 
tions, however, had only two possible answers, one correct and one 
false. To permit a distinction between correct answers arrived at by 
chance and correct answers due to the full understanding of the prob- 
lem, the question ‘‘why”’ was added to many economic application 
tests: The subjects were asked to explain their answers.' The task of 
judging the explanations made by the subjects proved to be quite 
simple. The stylistic abilities of the subjects and their readiness to 
find appropriate formulations were disregarded. Most explanations 





‘In the physics tests, in order to complete the experiment within a short time 
and on the basis of an entirely objective scoring method, the subjects were not 
asked to explain their answers. 
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could be classified immediately as ‘the student has no idea,” “he 
refers to matters which have no connection to the question,” or “this 
is no guess, even though the explanation is not well formulated,” and 
“he does mention the essential factors.” In addition, many papers 
did not contain any explanation of the answer. Less than five per cent 
of the papers contained problematical explanations. Correct answers 
with good explanations were awarded a score of three points, with 
problematical explanations two points, with false explanations or with- 
out any explanation one point. To a few simple questions only scores 
of two and one points weie given. In the physics tests all correct 
answers were scored one. 


THE EXPERIMENTAL PROCEDURE 


Three experiments were performed with both materials and both 
types of text.1 The stages “‘foretest” and ‘‘learning”’ were the same 
in the three experiments, but the last stage, the test, differed. In 
Experiment I memory questions were asked, the answers to which 
were given in both texts read previously. The aim of the experiment 
was to ascertain to what extent the readers of Pr.T. and En.T. learned 
what they were specifically taught. In the other two experiments 
application questions were asked. In Experiment II the subjects had 
to answer application questions while the text, which they read previ- 
ously, remained before them. They were encouraged to use the text 
in trying to answer the questions by being told that the text might 
help them in finding the answers. The aim of the experiment was to 
ascertain whether or not and-to what extent the application questions 
could be answered on the basis of the two texts (by our subjects and 
within a certain time limit). Experiments I and II may be classified 
as preliminary; their results will be compared with those of Experi- 
ment III—the main experiment. The difference between Experi- 
ments II and III was that in the Jatter the texts were collected prior 
to the distribution of the test questions. In answering the questions, 
the subjects therefore had to rely on what they had learned previously. 
Experiment III serves to determine whether there was a difference in 





1 Most of the experiments with the economic material were carried out in the 
Psychology Department of the University of California in Berkeley. The class 
exper'ments on physics were conducted in the Business School of the College of 
the City of New York. Additional experiments with both materials were made in 
the Psychology Department of Columbia University, New York City. The author 
takes this opportunity to express his sincere thanks to the heads of the depart- 
ments and the instructors who permitted the use of their classes. 
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answering the application questions between those subjects who had 
studied Pr.T. and En.T., respectively. 

After the distribution of the texts the subjects were asked in each 
experiment to read the text attentively, in the same way as they were 
used to read a textbook in order to be able to answer certain questions 
to be asked Jater. They were told that different texts were being used, 
the purpose of the experiment being to find out which of the texts was 
the best. The reading time was set at fourteen minutes for the texts 
in economics and eight minutes for those in physics. Practically all 
subjects finished their first reading of the text before the reading time 
was over. After looking around, they then either resumed their read- 
ing spontaneously or were told by the experimenter that they had a 
few more minutes in which they could occupy themselves with the text. 

All the experiments, with the single exception of what wil] be called 
Experiment IIIc, began with a foretest and the exclusion of those 
students who succeeded in answering the foretest questions or reported 
experience in the subject-matter of the experiment. Experiment III 
was performed in three forms with the economic material. The main 
difference concerned the method of selecting the subjects; in addition 
there were slight differences in the tests. The experiments, which we 
shall call IIIa and IIIb, were conducted in the office of the experimenter 
in groups of two to six students who volunteered to participate. In 
Experiment IIIa undergraduate students and in Experiment IIIb 
graduate students and teachers were included. Whether a subject was 
given Pr.T. or En.T. was determined at random. The subjects of 
Experiment IIIc were more or less compelled to participate. The 
experiment was performed in a large class in educational psychology 
during the Summer session, which was composed of a variety of 
students, teachers and nurses. No foretests were made, and the 
groups were not equated. The first student received En.T., his 
neighbor Pr.T., and so forth, the texts being distributed alternately 
and the subjects asked to sign the copies they read, by which means 
it was found out later to which group a student belonged. 

All the other experiments—Experiments I and II, the experiments 
on physics, and those with the control groups—were conducted in 
classrooms by distributing the texts alternately to undergraduate stu- 
dents. They differed from Experiment IIIc inasmuch as foretests 
were applied first to exclude inappropriate subjects. 

The time limit set was twenty minutes for the application questions 
in economics and twelve minutes in physics. 
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TaBLeE I].—CoMPREHENSIVE TABLE OF RESULTS 

































































Average 
Number Average | Standard ee ae 
of per cent of 
subjects — — maximum 
score 
Experiment I: Answers to Memory Questions 
Physics (max. score 7) 
Se ee eR 18 1.44 .27 20.6 
MS Weis amid a ene ee Uae aw dont 9 4.11 . 66 58.7 
NONE Das: oo rer a 9 4.33 47 61.9 
Economics (max. score 9) 
PE icideedeseevecesecs 20 2.40 . 28 26.7 
Neh in ed cnialinw a bebh een és 16 6.00 .68 66.7 
ERE een eee 17 5.82 43 64.7 
Experiment II: Answers to Application Questions (Texts Present) 
Physics (max. score 9) 
DE iidébddke ss fs ateeidecad een 10 5.30 . 63 58.9 
a ce ee Be a 11 5.73 .60 63.7 
Economics (max. score 19) 
Nias s cua bo taheaaeetonetne 12 10.47 .75 55.1 
lg a is th beak d adil 10 10.77 1.18 56.7 
Experiment III: Answers to Application Questions (Texts Absent) 
Physics (max. score 9) 
SEND, vcccccscceceseecens 16 3.25 .46 36.1 
eee eet 19 3.58 .46 39.8 
cist bests beeevcedtwheowwexh 19 5.47 .34 60.8 
Critical ratio between Pr.G. and 
RT ob4 0 cand dee 64's ab tees 3.30 
Economics 
Control group (max. score 19)...... 15 3.53 45 18.6 
Exp. IIIa (max. score 19) 
Ph dicdeetenehweetcaeddeiensee 16 4.44 .55 23.3 
ibe hen whinrh eee eke tedesees 17 11.94 .85 62.8 
Critical ratio of differences......... 7.40 
Exp. IIIb (max. score 18) 
teu 1S genie es wule hehe eee 20 6.00 .70 33.3 
SEES Se ee ree b 17 12.24 .94 68.0 
Critical Ratio of Differences....... 5.32 
Exp. IIIc (max. score 19) 
SE ss iibbidémbniune «pede bereaee 35 6.06 .63 31.9 
ES arr eres per 32 9.62 68 50.6 
Critical ratio of differences......... 3.83 














1 En.G. and Pr.G. are abbreviations for the groups which have studied En.T. 


and Pr.T., respectively. 
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QUANTITATIVE RESULTS 


The results of the experiments are presented in Table I. It con- 
tains, first, the average scores of the various groups in the three experi- 
ments. Differences in the maximum scores obtainable impede the 
comparability of the average scores. Therefore, in the last column of 
the table the average scores are expressed in terms of the per cent of 
the maximum obtainable score. In that column 100 per cent would 
mean that all questions have been answered correctly by all members 
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of a group; zero, that no question has been answered correctly by any 
member of the group. 

The variability of the average scores is indicated by the standard 
error of the averages, and the reliability of the differences between 
Pr.G. and En.G. in Experiment III by the critical ratios in terms of 
sigma. With regard to some of the main experiments a more complete 
presentation of the results is indicated. Figure 1 shows the range 
and distribution of the scores obtained in Experiments IIIa and IIIb 
(economics). The scores are reproduced in cumulative frequency 
polygons. We find there that in Experiment IIIa two members of 
Pr.G. scored 17 and ten members 12 or better; only one member of 
En.G. scored as high as 8 and only six members of that group scored 
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6 or better. In Experiment IIIb there was one subject belonging to 
Pr.G. who made a perfect score of 18. In Experiment III performed 
with the physics material sixteen members of Pr.G., but only five 
members of En.G. scored 5 or better. 

The results of Experiment I indicate that the subjects learned 
something by reading the texts: Their answers to memory questions 
were far superior to those of comparable control groups who did not 
read any text. The scores of Pr.G. and En.G. may be considered 
equal with both materials. Reading one of the texts was as good as 
reading the other text when information included in both texts was 
to be recalled immediately after learning. 

The results of Experiment II show that it is possible to answer 
the application questions on the basis of both texts. There was no 
significant difference in favor of either text when the subjects were 
permitted to consult the texts during the testing period, the average 
scores of both Pr.G. and En.G. being near 60 per cent with both 
materials. (Without having read any text the application-question 
scores were 18.6 per cent and 36.1 per cent, respectively, with the 
questions in economics and physics, as shown by the achievement of 
the control groups printed under the heading of Experiment III.) The 
individual differences were here extensive, partly because some sub- 
jects lost much time by rereading the texts each time when they took 
up a new question. On the other hand, there were a few subjects 
among the members of Pr.G. who have not consulted the text at all 
when they were thinking about the questions and writing the answers. 

In Experiment III we find a highly significant difference between 
the performances of Pr.G. and En.G. The members of the first group 
answered the application questions much better than those of the 
second, in the experiments with economics as well as with physics. 
Pr.T. served as a substantial aid; En.T., however, only as a rather 
restricted aid in enabling its readers to solve the application problems. 
The average scores of En.G. were much lower than the scores made 
when the same subjects were allowed to retain the text during the test. 
With regard to Pr.G., however, Experiments II and III show prac- 
tically no difference: Answering the application questions after the text 
was removed was as good as answering the questions by consulting 


the text.! 





1 All three forms of the economic experiments brought similar results, though 
the lower critical ratio in Experiment IIIc may be due to the inclusion of a few 
subjects with economic training in En.G. in that experiment. In the physics 
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By comparing the data reported in the last column of Table I 
we find: 

The scores of Pr.G. and En.G. are about equal in answering memory 
questions (Experiment I) as well as in answering application questions 
by means of searching the texts for help (Experiment II). The scores 
of Pr.G. are better than those of En.G. when the application questions 
are asked after the conclusion of the study of the texts (Experiment III). 

In the groups that studied Pr.T., there is no significant difference 
in the scores of the three experiments. In those that studied En.T.., 
the score in Experiment ITI is significantly smaller than that in the two 
other experiments. 


TaBLeE [].—ComPpaRaTIvE DIFFICULTY OF THE VARIOUS Puysics TESTs 
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While in Table I a battery of tests was treated as a unit, in Table II 
the results obtained with each physics application question are reported 
separately. With the exception of test 5, which caused great diffi- 
culties to Pr.G., this group was superior to En.G. in each of the nine 
physics tests, irrespective of the fact that certain questions were rela- 
tively easy and others difficult for both groups. To answer some of 
the questions the understanding and the use of the function of the 
principle of inertia appears decisive (e.g., test 1), while others are 
primarily concerned with the problem of gravitation (e.g., test 4) or 
force vectors (e.g., test 10). These differences have not affected the 
results; the superiority of Pr.G. over En.G. is about the same in 
different types of question. 

The analysis of the results obtained with the various economic 
application questions shows that the scores of En.G. ranged between 
21 per cent and 37 per cent, and those of Pr.G. between 35 per cent and 





experiment there was almost no difference between the score of En.G. and that 
of the control group. The latter is relatively high because of a number of lucky 
guesses—no explanations were required here—which was not the case with En.G. 
The members of that group left many questions unanswered when their deliber- 
ations brought no decision in the allotted time. 
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75 per cent. (In these calculations the three forms of the experiment 
are not differentiated.) The scores of Pr.G. were without exception 
higher than those of En.G. Out of altogether ten questions the differ- 
ence exceeded 100 per cent in three and was statistically insignificant 
in two. One further question was used in these experiments. Its 
answer was given in En.T. but not in Pr.T., so that it was a memory 
question for En.G. and an application question for Pr.G. The results, 
not included in Table I, were 69 per cent for En.G. and 71 per cent 


for Pr.G. 


QUALITATIVE DATA 


A few experiments were performed one week after the learning 
period. The subjects were not told at the end of the first experimental 
session that the experiment would be continued, but on the eighth day 
the subjects who participated in Experiments IIIa and IIIc were asked 
to come to the office of the experimenter for a second experiment. 
Twenty-eight of those who read Pr.T. and twenty-nine of those who 
read En.T. could be gathered. They were asked: ‘‘ What was the 
essence, the main idea, of the text you read a week ago? Write it 
down in your own words, in a few sentences.”’ Of the Pr.T.-readers 
twenty-one referred to the relationship of the proportion of rigid costs 
to the fluctuation in profits, which was in the intention of the wnter of 
the text the main principle explained. Three papers of the En.T.- 
readers fell in the same category, while eighteen may be classified as 
“enumeration of topics.’’"! After having finished, the subjects were 
given a second task: “‘ Those of you who have not done so, should write 
down the most significant statement you can recalJ.’”’ Only a few 
Pr.G.-members but many En.G.-members did now change their 
answers. The latter referred to not less than seven different topics, 
and the topic mentioned most frequently was found in six papers only. 

Various further differences in the behavior of the two groups can 
be reported. The answers to the application questions were, in 
general, much longer in the case of Pr.G. than of En.G. At the end of 
the experiment several Pr.G.-subjects complained about the shortness 
of the time granted and asserted that they could have written longer 
explanations if they would have been given more time. A few papers 





1 Typical example: ‘‘I read about the difference between stocks and bonds, the 
various cost items, the profit and loss account.”” The remaining papers contained 
sentences such as “‘something about business matters’’ or ‘“‘I have forgotten 


everything.” 
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contained discussions exceeding the scope of the text as well as the 
question. Furthermore, several Pr.T.-readers came to the experi- 
menter at the end of the experiment to ask questions such as, “‘is 
nothing done to prevent the large profit fluctuations?” Such questions 
indicate that Pr.T. induced some of its readers to think about the 
problems. On the other hand, a frequent comment by En.T.-readers 
consisted in the written or stated remark, ‘why do you suppose that 
I know the answer to this question?”” The words “guess” or “I think 
it is so, but I don’t know why” were frequently found in these papers. 
Of course, there were a few En.G.-members who “‘solved the riddle” 
and saw the connection between the text and the questions; that is, the 
reason why the variability was much higher in this than in the other 
group. At the beginning of the second test (one week after training) 
twice as many Pr.G.-members as En.G.-members reported that they 
had thought of the text in the meantime. 

Turning now to the procedure of the readers of Pr.T., it may be 
assumed, on the basis of their recollection of the essence of the texts, 
that these subjects did not learn a number of more or less unconnected 
statements, rules, or definitions, but acquired an integrated knowledge 
of a principle, or a few interconnected principles. The reproduction 
of the principle was, however, not sufficient to answer the application 
questions. The analysis of the subjects’ explanations, and later dis- 
cussions with the subjects about their procedure in solving the appli- 
cation problems, revealed certain aspects of the processes of “com- 
prehension” and “applying.”’ For our present purposes it suffices to 
state on the basis of those studies that the principles did not stick in 
the mind of the learners in a rigid, unchangeable form; on the con- 
trary, they were flexible and adaptable to new situations. A question 
may have first appeared as referring to something entirely new, but 
then a reorganization or transformation of either the question or the 
remembered knowledge took place, with the result that the two fitted 
and their common aspects or functional relations became manifest. 
The subjects made “‘discoveries” or saw new implications when they 
thought of the problems in the light of their previously acquired under- 
standing of the effects of fixed costs or the principle of inertia. We 
may speak here of a reorganization of the memory traces and an 
extension of knowledge, which occurred by thinking at the same time 
of the test problem and the principle.! 





The procedure just described in solving application problems may perhaps 
serve the purpose of defining them. The importance as well as the difficulty in 








Aud 
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It appears that the En.T.-readers were often unable to recenter 
their recollections in the way required by the application questions, 
They failed to solve a problem because (a) it appeared to be unrelated 
to everything that they read previously or (b) they connected it with 
a reproduced specific statement which did not apply to the given case. 
Three other factors—incorrect recall, mistakes made after the recall of 
the appropriate rule, answer based on superficial analogies—were effec- 
tive in a much smaller number of cases. 


DISCUSSION 


What qualities of Pr.T. are responsible for the adaptability of the 
knowledge acquired, and what qualities of En.T. for the lack of adapt- 
ability? It is not possible to answer this question witb ful! exactness 
since the texts differed in many respects, and the relation of the appli- 
cation questions to the texts is rather complex. In a few instances it 
appears that those items of information which had to be combined in 
order to answer one or the other application question, were neighbors 
in Pr.T., but were scattered, or even embedded in different contexts, 
in En.T. With regard to certain other questions it may be attempted 
to explain the higher scores of the Pr.T.-readers by assuming that the 
information x (cf. kinds of presentation under subhead ‘‘ The Texts”) 
was emphasized to a greater extent in Pr.T. than in En.T. Without 
undertaking here to determine the ultimate causes of the differences 





differentiating application questions from memory questions has been well stated 
by E. Claparede: 

“‘Intelligence consists in transferring to new situations such techniques which 
were successful in different but analogous past experiences. What matters is to 
grasp the functional identities under different aspects. Let us note how difficult 
it is to apply this criterion in an experimental investigation. For, from which 
degree of difference between two situations (old and new) will we be authorized to 
say that there was an intelligent transfer and not a repetition of the reaction by 
means of simple automatism?” (‘‘La Genese de l’Hypothese.” Arch. de Psych., 
Geneve, Vol. xxrv, 1934, p. 151.) 

In preparing test questions, the author became well aware of this difficulty; 
there are, no doubt, questions concerning which one cannot decide in which 
category they belong. On the other hand, there are questions with regard to which 
the classification is unequivocal. It should not be asserted that the author suc- 
ceeded in selecting application questions of the latter type only. But his intention 
was directed toward the formulation of questions which could not be solved by 
mechanical recall and blind subsumption alone, but required the operation called 
‘‘transformation’”’ in order that the possibility of an application may be recognized. 
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in the organization of the two texts, the following hypothetical state- 
ments may be put forward on the basis of the experiments. 

It is possible to formulate texts which make it easy for the learner 
to acquire ‘‘applicable knowledge,’’ and others which make it difficult 
for him to attain that result. A hierarchy of such texts can perhaps 
be established. At the lower end of such a scale we should find piece- 
meal juxtaposition of items. The greatest difficulty will probably be 
experienced if the sequence of the items is intentionally confused, so 
that the readers are misled about their inherent relations. An arbi- 
trary order (e.g., an alphabetic order) of the several items may come 
next onthescale. If the order of the items is determined meaningfully, 
the acquisition of applicable knowledge may become easier, but here, 
too, a distinction should be made between instances in which the 
principle determining the sequence is difficult to grasp and others in 
which it evokes insight into the structure of the text. The upper end 
of the scale appears to be reached when there are no detached items 
at all, but the requirements of the unitary text determine the place 
and function of each part, and the hierarchy of the parts within the 
larger context 1s clearly given. These assumptions require confirma- 
tion by means of different texts and different application questions. 

In turning from the properties of the texts to the ways the texts 
are studied, the question arises whether the term “learning by under- 
standing”’ should be applied to all instances of attentive reading. In 
view of the great differences in the ability of the two groups to answer 
application questions, it does not appear to be useful to assert that all 
attentive reading of meaningful texts involves the process of real 
understanding.! Acquisition of specific information is a distinct cate- 
gory of learning by reading, to be contrasted with the realization of the 
full implication of the text.? 





‘Thus we would not follow the argument of E. L. Thorndike in his article 
entitled ‘‘Reading as Reasoning.’”’ Thorndike says: “It seems to be a common 
opinion that reading is a rather simple compounding of habits . . . (But) reading 
is a very elaborate procedure . . . The act of answering simple questions about a 
simple paragraph includes all the features characteristic of typical reasonings”’ 
(J. Ed. Psych., Vol. v1, 1917, p. 323). The questions asked by Thorndike were 
not application questions. 

2M. J. Adler distinguishes the following types of reading: 1. for amusement, 
2. for knowledge, (a) for information, (6) for understanding (How To Read a Book, 
New York, 1940). Our analysis is related to, but not identical with, the following 
description by Adler of what certain readers of a book did: ‘“‘ They did not have the 
faintest understanding of what they had read. It was just words they had memo- 
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The method of using application questions for the purpose of 
determining the differences between two forms of learning has been 
employed in the author’s previous investigation, in which acquiring of 
mastery of certain tasks by memorizing and by understanding were 
compared. There, in solving the practiced tasks no difference was 
found between those who learned in the one or the other way, but in 
solving related new tasks the ‘understanding group” was distinctly 
superior to the “memorizing group.’”’! The latter group, whose Jearn- 
ing may be characterized briefly as stamping in by frequent repetition, 
was able to solve the practiced tasks but failed in solving new tasks. 
A similar result—ability to answer memory questions but failure to 
answer application questions—was obtained in the experiments 
reported in this paper, although the members of En.G. read a meaning- 
ful text for the sake of understanding it. In both cases the learning 
results were restricted within rather narrow limits to the recall of the 
specific information taught. It can, therefore, be assumed that one 
form of learning by reading is not essentially different from mechanical 
memorization; that is, the form which consists in accepting and com- 
mitting to memory the information read, just as it was presented. 
Information acquired in that way, the same as information learned by 
heart, lacks that organization which would make it a flexible and 
adaptable part of a greater whole. On the other hand, learning by 
reading, if it involves the understanding of a principle and the full 
realization of the principle’s implications, may lead to knowing more 
than was specifically taught. This is transfer in the fullest sense of 
the term. Its prerequisite appears to be the acquisition of flexible 
whole-qualities by means of organizing the learning materia] in a way 
appropriate to it. 





rized”’ (p. 37). A pupil knew what the author “said,” but not what he ‘‘meant.” 
Adler puts the blame on the inability of the readers. According to him very few 
readers know how to read for understanding, they read poorly, irrespective of the 
type of the text. Therefore, he wrote a practical book containing the rules which 
a good reader must follow, whereas our goal was to analyze the different forms of 
learning by reading which are available to most, if not all, readers (assuming & 
proper grade of maturity and motivation). 

A formulation by W. Koehler fits into this context provided a few words, 
by which Koehler restricts the validity of his statement to the special topic which 
he discusses, are omitted: “Within the more complete and continuous context . . . 
a given functional relationship may become completely understandable, whereas 
as a mere... rule it must be accepted without any further understanding” 
(op. cit., p. 48). 

1 Katona, op. cit., pp. 129 ff. 
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Since there was no difference between the two groups in their 
ability to answer memory questions, it hardly needs saying that 
studying an ‘‘enumerative text”’ is not implied to be generally and in 
all respects inferior to studying a “‘principle text.”” There are many 
purposes and educational aims for which no difference may be dis- 
covered between the two types of text and the two forms of learning 
by reading, or for which the first method may be more advantageous. ! 
Yet, in studying the disadvantages of the acquisition of specific infor- 
mation the possibility presents itself that reading an enumerative text 
is only one of many instances in which this form of learning will occur. 
Our results give rise to further problems not studied in this paper. 
Is there a relation between variations in attention, interest or motiva- 
tion, and the occurrence of the one or the other form of learning by 
reading? The distinction between the two forms of substance learning 
may be of help in the study of the rédle of motivation, interest, and 
attention, and their effect in determining or modifying the learning 
process. 

No reference has been made in this paper to the standard reading 
scales and tests of reading comprehension used in the schools. The 
relationship between those numerous investigations and this study 
cannot be discussed without further research. The present investiga- 
tions, carried out with college students, should be extended to school 
pupils. Moreover, it should be determined whether or not there is a 
correlation between the ability to answer memory questions and the 
ability to answer application questions (after reading the same text 
or different texts).2 Finally, the question should be asked in relation 
to this study, what the various reading scales really measure. Some 
of the questions used in those tests are obviously not application 
questions but require, in addition to adequate vocabulary, careful 
reading or attention for the purpose of finding or remembering certain 
facts stated in the paragraph read. To what extent questions which 
require more than mere recall are used in the comprehension tests has 
not been exactly established. * 


1 Reviewing a well-understood subject-matter is one purpose for which En.T. 
may be more suitable than Pr.T. (cf. G. Katona, ‘“‘The Réle of the Order of Pre- 
sentation in Learning,” Am. J. Psych., Vol. tv, 1942). 

?R. W. Tyler, who made the most extensive use of application questions after 
science courses, concludes that ‘‘the relatively low correlations show clearly that 
application is a mental process different from mere recall” (in C. H. Judd, Education 
as Cultivation of Higher Mental Processes, New York, 1936, p. 13). 

* Experiments or brief remarks by Pressey, Gates, Tyler, Judd, Tinker, J. C. 
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APPENDIX 


Owing to considerations of space only one set of material, the 
application questions for the physics experiments, will be reproduced 
here. The other test questions and all the texts used in the experi- 
ments are deposited with the Psychology Reading Room of Columbia 
University Library and are available there. 


1. A stone is dropped from the top of the tall mast of a fast ship. Some- 
body argues: Since the vessel advanced along its course after the stone was 
released, the stone struck the deck some distance aft of the base of the mast. 
Is that statement true or false? 

2. We have a new device through which we stop the ship a split second 
after the stone is released from the top of the mast. Where does the stone 
strike the deck, exactly under the point from which it was dropped, aft, or in 
front of that point? 

3. A pilot is flying over a railroad terminal in enemy territory. When 
should he release his bomb, when he passes directly above the terminal, earlier, 
or later? 





Three airplanes which are flying the same route plan to bomb a railroad 
terminal. Airplane ‘‘Standard” (S) is flying at 3000 feet altitude with a 
speed of 200 miles per hour; airplane “‘ Fast” (F) is flying at 3000 feet altitude 
with a speed of 400 miles per hour; airplane ‘‘ High” (#) is flying at 6000 feet 
altitude at a speed of 200 miles per hour. 

All three airplanes release their bombs when they are directly above the 
termina]. Will the bombs fall at the same place, or one bomb on or near the 
terminal and another bomb further away? In order to find out what you 
know about the distance of the places where the three bombs will fall, please 
answer the following questions: 

4. Will bomb of S or of F fall nearer to the terminal, or will they fall on 
the same place? 

5. Will bomb of S or of H fall nearer to the terminal, or will they fall on 


the same place? 





Dewey, and others may be quoted in confirmation of the assertions that we do not 
know what the reading comprehension tests really measure and that true compre- 
hension, or the ability to draw inferences, is different from recall. R. Gans has 
recently shown that ‘the abilities involved in a reference type of reading, which 
requires recognition of material relevant to a problem, are not adequately measured 
by reading comprehension tests” (Critical Reading Comprehension in the Inter- 
mediate Grades, Teachers College, 1940, p. 109). R. Strang distinguishes a great 
variety of “‘reading abilities’? at the high-school and college level (Problems of 
Reading in High School and College, Lancaster, 1940). 
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6. Will bomb of F or of H fall nearer to the terminal, or will they fall on 


the same place? 
In the next three questions we are interested to find out something about 


the time the bombs need from their release until they strike the earth. 
7. Does bomb of S or of F need the shorter time to fall, or do they need 


the same time? 
8. Does bomb of S or of H need the shorter time to fall, or do they need 


the same time? 
9. Does bomb of F or of H need the shorter time to fall, or do they need 


the same time? 
10. A rifle bullet is fired horizontally from the edge of the parapet on the 


top of a high building. At the same instant the bullet leaves the gun another 
bullet is dropped over the edge of the parapet. Which bullet will strike the 


ground first? 


Note: Question 8 was included for the sake of completeness, but 
the answer to 1t was not scored in the experiment because most students 
answered that question correctly without any instruction. 

The repetition of the experiment, carried out in 1940, may be 
difficult inasmuch as several articles in newspapers and popular maga- 
zines have recently discussed the trajectory of bombs. Thus the 
answer to the questions may now be known to many students without 


any instruction. 








THE EFFECT OF BILINGUAL BACKGROUND ON 
COLLEGE APTITUDE SCORES AND GRADE 
POINT RATIOS EARNED BY STUDENTS 
AT THE UNIVERSITY OF HAWAII* 


MADORAH E. SMITH 
Honolulu, T. H. 


The University of Hawaii draws its students from a population of 
very diverse antecedents, a large proportion of whom come from homes 
where another language than English is spoken. The majority are 
graduates of the public elementary and high schools of Hawaii or of 
private schools where English also is the language of instruction. 
However, some of them also have attended foreign language schools 
where they have received instruction in the languages spoken by 
their forefathers. It seemed desirable, therefore, to attempt to 
determine what effect, if any, this bilingual background has upon 
the achievement of the student body. 

To measure the extent of bilingual background, an adaptation 
of Hoffmann’s Bilingual Inventory! was used. The wording of the 
inventory was altered to adapt it to older students. A question con- 
cerning attendance at picture shows given in another language was 
added. Also, since a very imperfect English,* the result of attempts 
of people of many linguistic backgrounds to converse with each other 
is prevalent in the Islands, four questions were added asking: “Do 
the following speak to you using sentences that are a mixture of two 
or more languages?” ‘Do the following in speaking English to you 
use pidgin English?” ‘‘In speaking to the following, do you use 
sentences that are a mixture of two or more languages?” and “In 
speaking to the following, do you use pidg:a English?” The people 
listed under each of these four questions were ‘‘father, mother, grand- 
father, grandmother, brothers and sisters, and other relatives.” 
These four questions were used in calculating what are called the 
Qualitative Scores in this study. They were scored separately, 
dividing the total scores which were obtained by the number of sub- 
items in the same way as the Bilingual Scores are found by Hoffmann’s 


method. 





* This study was begun and nearly completed as a master’s thesis by Kenneth 
Q. Ching. After his death, the available raw data of his material were reworked 
with the help of Richard S. Takasaki. 
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All freshmen entering the University are given the American 
Council Psychological examination before entrance. These scores 
and the grade-point ratios of the students who were studied were made 
available for this research. All freshmen in Teachers College were 
given the Bilingual Inventory in 1935 and 1936, and in 1935 the fresh- 
men in the College of Arts and Sciences were also given the Inventory. 
The mean scores on the Bilingual Inventory are shown in Table I. 
It should be understood that almost all the students were born in 
the United States and are American citizens. Where racial designa- 
tions are given it is only for convenience instead of the longer way of 
saying students of a certain ancestry. A score of zero on the Hoffman 
Bilingual Inventory means English is the only language spoken; a 
score of forty, English is never spoken. Likewise a qualitative score 
of zero means standard English only is spoken; one of forty, the 
English spoken is always pidgin or mixed. 

The least and next to the poorest English is used in the homes of 
the students of Japanese ancestry where it is spoken not quite half 
the time and about one-fourth of the time it is used in the form of 
pidgin English as is indicated by their qualitative score of 10.3. In 
the Korean homes, English is used almost sixty per cent of the time; 
and in the Chinese homes, it is used nearly three times as much as 
their ancestral language. The Koreans whose qualitative scores 
weraged 12.8, however, use a poorer quality of English and the 
Chinese, who averaged 6.4, a better quality than do the Japanese. 
Sixty per cent of the Chinese parents were born in China. The 
Hawaiians use approximately seven times as much English as Hawaiian 
or other language and use even less pidgin English. Their qualitative 
score averaged only 4.0. The Caucasians with an average qualitative 
score of 0.7, although ten per cent of all their parents or half of those 
who were foreign-born, were born in non-English-speaking countries, 
claim they use almost nothing but standard English. The Japanese 
and Koreans are the least variable groups as to amount of English 
spoken. Probably that is because almost all of their parents are 
foreign-born while those of the Chinese and Caucasians include both 
foreign-born and American-born and the Hawaiian group includes 
those of part-Hawaiian ancestry. Although only two of the ninety- 
two parents of the last group were born in non-English-speaking 
countries, fifteen are of Chinese ancestry. 

The qualitative scores were not found to be significant in affecting 
the other factors studied. 
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Table I shows the means and their standard errors for each racial 
group of scores on the Bilingual Inventory, their percentiles on the 
college aptitude test and also the grade-point ratios each group earned 
during their first few semesters at the University. There were three 
hundred and eleven students of the original group of three hundred 
and sixty-four who remained three semesters. The grade-point ratios 
at the right in the table are based on three semesters of university 
work. The critical ratios of the differences between racial groups 
are calculated for the original group only. Those on the bilingual 
scores are all highly significant, every difference much exceeding the 
three times its standard error that is considered statistically significant, 
except that the difference between the Japanese and Korean groups 
is only 2.7 times its standard error. The average college aptitude 
score is highest for the Caucasians and lowest for the Hawaiians and 
Koreans while the Japanesé and Chinese group show an insignificant 
difference between themselves. The last two groups are both signifi- 
cantly lower than the Caucasians and considerably higher than the 
Hawaiians and Koreans. The difference between the two lowest 
groups is negligible. 

These differences on the psychological examination cannot be 
considered indications of racial differences even of those representatives 
of each race resident in Hawaii, for there is a considerable difference 
in the sampling of the different races at the University. The 1930 
census gives the following percentages of young people fifteen to 
nineteen years of age from the races studied: Caucasians, twenty-one; 
Japanese, thirty-nine; Chinese, eight; Koreans, one, and Hawaiians, 
sixteen; while the corresponding percentages in the student body for 
the year the study was begun were thirty-two, twenty-nine, twenty- 
two, four and twelve, respectively, and among all freshmen studied 
they were seventeen, thirty-nine, twenty-six, six and thirteen, respec- 
tively. It is evident, therefore, that the Caucasian group, although 
somewhat underrepresented in the classes studied, are somewhat 
overrepresented at the University; the Chinese and Koreans very 
much so; while the Hawaiians are a little underrepresented and the 
Japanese although somewhat so in the student body were exactly in 
the same proportion in the groups studied. So the last two groups 
may be somewhat selected samples. There is also a difference in the 
proportions dropping out; more of the Caucasians and fewer of the 
Japanese than the average do. Among those remaining three semes- 
ters forty-three per cent were Japanese, twelve per cent Caucasians. 
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More Caucasians attend or transfer to mainland colleges than do the 
other races. 

In grade-point ratios the differences are small, but there are several 
significant differences for the three groups, Caucasian, Japanese and 
Chinese; those with higher college aptitude scores all earn significantly 
higher grade-point ratios than do the two groups with lower college 
aptitude scores. 

The students dropping out alter the averages but little. The 
bilingual scores are insignificantly higher for the group as a whole due 
mainly to the differential dropping out. Of the fifty-three leaving, 
thirty, or over half, were Caucasians or Hawaiians whose bilingual 
scores are very low. The Koreans remaining had slightly higher, the 
Chinese slightly lower scores than the original classes. The other 
groups’ averages remain practically the same. 

Except for the Chinese, those who remained averaged higher on 
the psychological test. As the highest per cent leaving were from the 
higher scoring Caucasian group, there is a drop of three-tenths of a 
point when the entire group is considered. 

The grade-point ratios of the students remaining are higher in every 
group compared; and, although the difference between the average 
of those remaining and the entire original group is not significant, 
it is over twice its standard error. This is, of course, what would be 
expected since unsuccessful students so much more often leave. 

Table II shows the correlations between the factors studied. For 
those staying three semesters, correlations are not calculated separately 
for the smallest groups. For the group as a whole there is a low 
negative correlation between college aptitude scores and the extent 
to which a foreign language is spoken in the home. This is true 
whether the original group or those remaining is studied and the 
correlation is considerably increased for the original group by removing 
the influence of grade-point ratio. Apparently the persistent student’s 
college aptitude score is less affected by the bilingual environment 
of the home. A similar low negative correlation is found for each pair 
of the racial groups except of the Japanese where there is almost zero 
correlation. Whether the Japanese group accepted at the University 
are a more selected group as their lower proportion in the student body 
and their slightly higher grade-point ratio suggests, or their greater 
homogeneity as shown in their low coefficient of variability accounts 
for the absence of correlation in their case, it is impossible to say. 
It is possible that those students whose bilingual background affected 
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their college aptitude scores were not admitted to the University. 
The negative correlation of all other groups and of the combined 
groups between bilingual and college aptitude scores suggests that 
our students are still handicapped by bilingualism at the time of 
college entrance. 


TaBLE II.—CoRRELATIONS BETWEEN BILINGUAL Scores, CoLLEGE APTITUDE 
PERCENTILES AND GRADE Point RatT10s 












































BS and CA BS and GPR CA and GPR 
Num- 
ber , PE , PE , PE 
First semester 
Japanese....... 141 .044 . 06 .103 .06 .490 .04 
a 21 — .247 .14 .431 .12 . 306 .14 
Chinese........ 94 —.131 .07 .216 .07 .447 .06 
Hawaiian...... 46 — .156 .10 — .069 .10 .128 .10 
Caucasian...... 62 — .200 .08 .022 .09 .398 .07 
ore 364 —.150 .03 . 156 .03 .415 .03 
Three semesters 
Japanese....... 133 .074 .06 — .013 .06 .516 .04 
Chinese........ 84 — .229 .07 — .033 .07 .738 .03 
SR Wades one 311 — .116 .04 .159 .04 | .493 .03 
BS and CA (GPR) | BS and GPR (CA) | CA and GPR (BS) 
Second order 
correlations r PE r PE r PE 
First semester 
Japanese... .. — .007 .06 .09 15 .49 .04 
Korean....... — .43 .12 .50 ll .41 .13 
Chinese...... — .26 .07 31 .07 .49 .05 
Hawaiian..... —.15 .10 — .05 .10 .12 .10 
Caucasian....| —.23 .03 ll .04 41 .07 
Mictcccsccack cae .03 .25 .03 44 .03 
Three semesters 
Japanese... .. .08 .06 — .06 .06 52 .04 
Chinese...... —.3l .07 21 .07 .76 .03 
C aaa .04 .25 .04 .52 .03 























The bilingual background, however, exerts little if any effect on 
grade-point ratio. Perhaps those cases that might be unable to over- 
come the handicap do not enter or else drop out. 
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College aptitude scores predict best the grade-point ratios of the 
Oriental groups and also predict slightly better the marks for three 
semesters than for one. Removing the effect of bilingual background 
raises the correlation in all comparisons made. The majority of the 
correlations between grade-point ratio and bilingual score are low 
and positive, which suggests that the majority of the students able to 
pass the college entrance examinations despite bilingual handicap are 
better than average students. 

A question concerning attendance at foreign-language school had 
been added to the inventory. Perhaps because of its placement on 
the questionnaire, it was omitted by many students. However, a 
comparison was made for those who answered the question between 
those who did and those who did not attend foreign-language school. 
Almost all the Japanese, the majority of the Koreans and Chinese, 
and almost none of the Hawaiians attended one of these schools. The 
average number of years attended was also greatest for the Japanese. 
Ninety-eight Japanese averaged ten years of attendance at Japanese 
language school. Their college aptitude percentile scores averaged 
38.4, bilingual scores 21.5, qualitative scores 11.0, and grade-point 
ratios 2.38; while the three who were the only ones of that race who 
had failed to attend a foreign-language school averaged 39.0, 6.3, 
7.0 and 2.4, respectively. 

Fourteen Koreans had attended Korean language school an average 
of 5.9 years. Their average scores compared with those of the four 
students who had not attended were, on College Aptitude 23.8 as 
against 29.0; on bilingual inventory 17.9 against 15.5; on qualitative 
questions, 12.7 against 11.5, and on grade-point ratios 1.84 against 
2.05. 

Fifty-nine Chinese had averaged 6.7 years of attendance at Chinese 
language school. Their average scores compared with the fifteen 
students who had not attended were, on college aptitude 38.3 against 
42.1; on bilingual inventory 9.7 against 9.5; on qualitative measure 
8.5 against 7.7, and on grade-point ratio 2.35 against 2.17. 

Only three Hawaiians had attended foreign language school an 
average of nine years while thirty-two stated they had not attended. 
The average scores of the groups follow (that of those attending being 
given first): College aptitude percentiles 32.3 and 27.4, bilingual 
scores 8.3 and 5.0, qualitative scores 7.0 and 4.1, and grade-point 
ratios 2.10 and 2.02. For every group the average bilingual and 
qualitative scores were higher for those attending foreign language 
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school, but the college aptitude scores were lower except for the 
Hawaiians. 

Virginia McBride,* studying University of Hawaii students who 
were freshmen in 1931, also found the scores on the college entrance 
test of Chinese and Japanese students who attended foreign language 
school slightly lower than the scores of those who had not. Yet the 
grade-point ratios of these students averaged higher for those who had 
attended foreign language school than for those who had not. It may 
be the less diligent students avoid foreign language school, or that it 
requires greater than average diligence in study for a student to pass 
high enough on the psychological examination to be admitted to the 
University, or that the student who previously has had to prepare 
lessons for two kinds of schools, when he has only university courses 
to prepare, finds the work relatively easy. 


SUMMARY 


(1) University of Hawaii students representing five different lines 
of racial ancestry were given an adaptation of Hoffmann’s Bilingual 
Inventory. 

(2) The groups, arranged according to the amount of English 
spoken in their homes from least to most, fall in the following order— 
Japanese, Korean, Chinese, Hawaiian and Caucasian. 

(3) On the college aptitude test the Caucasians ranked first, 
Chinese and Japanese groups next, and the Hawaiians and Koreans 
last. 

(4) The average grade-point ratios were highest for Japanese, 
next for Caucasians and Chinese with the Hawaiians and Koreans 
last. 

(5) Very low negative correlations were found between the scores 
on the bilinguai inventory and the college aptitude test for all groups 
but the Japanese. 

(6) Correlations ranging from .31 to .76 were found between scores 
on the college aptitude test and grade-point ratios for all groups but 
the Hawaiian. 

(7) Students who reported attending foreign language school had 
higher bilingual and qualitative scores. 

(8) The grade-point ratios were higher in the case of Japanese and 
Chinese for those who attended foreign language school but the reverse 
was true for the other groups. 
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(9) The college aptitude scores of those who attended foreign 
language schools were, except for a small group of part-Hawaiians, 
lower than for those who did not. 

(10) It would appear that bilingual background affects the college 
entrance examination scores of the students at the University of Hawaii 
much more than it does their achievement after entrance so far as that 
is measured by the grade-point ratio. 
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TWO NEW MEASURES OF READING ABILITY 


FREDERICK B. DAVIS 
Codperative Test Service of the American Council on Education, New York City 


The diagnosis of individual difficulties in reading has received an 
impressively large amount of attention during the past twenty years 
and gratifying progress has been made in teaching young children 
how to read. Now increasing emphasis is being given to the teaching 
of reading in secondary schools and colleges, a trend that prompts 
close study of the diagnostic tests available for students at the higher 
grade levels. 

Ideally, a diagnostic test in reading ought to provide reliable 
measures of the most important independent mental abilities and 
specific skills that are required in understanding the kinds of materials 
that students commonly have to read. No such test is now available, 
but as the first step in meeting the obvious need for it, a means of 
obtaining individual scores in two of the most important independent 
components of reading ability has been developed. The profile chart 
on which these two scores are recorded is shown in Fig. 1.! This 
“Profile Chart for Two Independent Scores Obtained from the Coép- 
erative Reading Comprehension Tests, Form Q”’ is, so far as the writer 
knows, the first practical result of the application of techniques of 
factorial analysis to research in reading. Its use constitutes, the 
writer believes, the first significant innovation in the methods used 
to measure reading ability since the introduction several years ago 
of the repeating scale procedure for obtaining both Level of Compre- 
hension and Speed of Comprehension scores from the same set of 
test items.? 

When the profile chart is used according to the directions that 
accompany it, two new measures of reading ability are obtained. 





1 Copies of the profile chart and the necessary scoring stencils may be obtained 
from Frederick B. Davis, 15 Amsterdam Avenue, New York City. 

? The repeating scale procedure was first applied to the Codperative Literary 
Comprehension Test, Form OQ, and has since been applied to all forms of the 
Codperative Vocabulary Tests and the Coéperative Reading Comprehension Tests. 

* Complete directions for obtaining and interpreting the Word Knowledge and 
Reasoning in Reading scores are provided with the profile chart. Special scoring 
stencils are used to reduce the labor of scoring to an absolute minimum. MHand- 
scoring is recommended unless a very large number of answer sheets are to be 
scored; but machine-scoring may be employed. 

On the profile chart the standard errors of measurement for the two obtained 
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The first is Word Knowledge, which is very closely related to scores 
obtained from the familiar tests used to measure recognition vocabu- 
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lary. The second is Reasoning in Reading. This is apparently 
a measure of ability to manipulate verbal concepts and relate them 
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scores are indicated by heavy black lines. 


heavy black line indicates the size of the standard error of the difference between 
obtained scores in the two components of reading ability that are measured. 


In the center of the chart another 
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meaningfully. It is probably the first direct measure of what Thorn- 
dike referred to in 1917 as the reasoning elements in reading.! These 
two new measures of reading ability possess several characteristics 
that distinguish them from all ordinary reading-test scores; of most 
interest, perhaps, is the fact that the two scores are entirely uncorre- 
lated. . Additional information concerning them can best be provided 
in a brief sketch of their development. 

When the Coéperative Reading Comprehension Tests were planned 
three years ago, every effort was made to insure their validity as 
measures of reading ability. Among authorities in the field of reading 
there is general agreement that reading is fundamentally a thinking 
process. The various kinds of eye-movement skills involved in reading 
must be regarded simply as mechanical aids in the process of getting 
meaning. To establish the validity of the Codperative Reading 
Comprehension Tests, a careful survey of the literature in the field 
of reading was made to determine what reading skills are generally 
regarded by authorities as the most important elements of reading 
comprehension. After the resulting list of skills had been classified 
and scrutinized, nine skills were selected for measurement. They 
were: 

(1) Knowledge of word meanings. 

(2) Ability to select the appropriate meaning for a word or phrase 
in the light of its particular contextual setting. 

(3) Ability to follow the organization of a passage and to identify 
antecedents and references in it. 

(4) Ability to select the main thought of a passage. 

(5) Ability to answer questions that are directly answered in a 
passage. 

(6) Ability to answer questions that are answered in a passage 
but not in the words in which the question is asked. 

(7) Ability to draw inferences from a passage about its contents. 

(8) Ability to recognize the literary devices used in a passage 
and to get its tone and mood. 





1 Thorndike, E. L.: ‘‘The Psychology of Thinking in the Case of Reading.” 
Psychol. Rev., Vol. xx1v (May, 1917), pp. 220-234. 
Thorndike, E. L.: ‘Reading as Reasoning: A Study of Mistakes in Paragraph 
Reading.”” J. Educ. Psychol., Vol. vu1 (June, 1917), pp. 323-332. 
Thorndike, E. L.: ‘‘The Understanding of Sentences.” Elem. Sch. J., Vol. 
xvii (October, 1917), pp. 98-114. 
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(9) Ability to determine a writer’s purpose, intent, and point of 
view; 1.e., to draw inferences about a writer. 

A large number of items were constructed to measure each of these 
nine reading skills. The ingenuity exercised in devising these items 
was of crucial importance in this study. In assembling the final forms 
of the Coédperative Reading Comprehension Tests an effort was made 
to include in each form a certain proportion of items testing each one 
of the nine reading skills listed above. These proportions were based 
on the judgments of authorities in the field of reading concerning the 
importance of each skill in reading comprehension. 

After Form Q had been published, arrangements were made to 
administer the lower and higher level tests to a large number of 
college freshmen under conditions that would permit every student 
to attempt every item. Four hundred twenty-one students did so 
and their scores in each of the nine reading skills were obtained. The 
intercorrelations of these nine scores were then computed. After 
proof that there were no significant sex differences had been obtained, a 
factorial analysis was made, using the principal axes method described 
by T. L. Kelley in Essential Traits of Mental Life.1 The nine principal 
components that were obtained were remarkably clear-cut and lent 
themselves to ready interpretation. This was probably because the 
items in each of the skills measured were selected on the basis of careful 
subjective judgment and because the influences of speed of reading and 
other mechanical elements in reading, such as accuracy of word percep- 
tion, were minimized by the conditions of the test administration.’ 

The two largest components accounted for eighty-nine per cent of 
the variance and were the only ones for which reasonably reliable 
individual scores could be obtained. A study of the loadings of the 
largest component indicated that it was clearly a measure of word 
knowledge. Since it is obviously necessary to know the meanings 
of words to read at all, each of the nine skills has a positive loading 
in this component. The loadings of both components are shown in 
Table I. 

The second largest component has its highest positive loadings in 
the two reading skills that require ability to infer meanings and to 





1 Kelley, T. L.: Essential Traits of Mental Life. Cambridge, Mass.: Harvard 


University Press, 1935. 
2For a detailed report, see: Davis, Frederick B.: Fundamental Factors of 


Comprehension in Reading. Unpublished doctor’s thesis on file at the Harvard 
University Library. 
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weave together several statements. This component may appro- 
priately be called a measure of reasoning in reading. Some people 
may be puzzled by the fact that this component has a negative load- 
ing in skill 1 (knowledge of word meanings). The explanation 
undoubtedly lies in the fact that individuals who know accurately 
the meanings of a great many words are thereby given a head start 


TaBLE I.—CoeEFFICIENTsS OF Nine READING Sxituts THat YIELD COMPONENTS 








I anv II 
Skill Component I: Component II: Standard 
Word Knowledge |Reasoning in Reading deviation 
1 .8134 — .5712 11.6060 
2 . 1844 . 1235 3.2501 
3 .0568 . 0538 1.7347 
4 .0274 .0482 1.1046 
5 . 1069 . 1492 2.4596 
6 . 3406 .4687 5.6718 
7 . 3360 . 5799 5.8096 
8 .0784 . 1048 1.8589 
9 . 2329 . 2532 4.0669 














toward getting the meaning of what they read. Therefore, if we want 
to measure reasoning in reading independently of word knowledge, 
we must give individuals who are deficient in word knowledge a 
“handicap” and then see how well they reason when they are on 
equal terms with their fellows in word knowledge. 

As already stated, scores in the two new measures of reading 
ability described in this article are different from the kind of scores 
ordinarily obtained from reading tests. In the first place, scores in 
Word Knowledge and Reasoning in Reading are both orthogonal and 
uncorrelated. In the second place, the variance of the Word Knowl- 
edge score has deliberately been made as large as possible, while the 
variance of the Reasoning in Reading score is as large as it can be made 
after the variance of the Word Knowledge score has been removed. 
These specia] characteristics of the two scores make them easy to inter- 
pret and unusually efficient to use as descriptive data for given individ- 
uals. The reliability coefficients of the two scores are shown in Table 
II. These coefficients compare favorably with those obtained from 
most standardized tests, though the reliability of the Reasoning in 
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Reading score is somewhat lower than we should like for purposes of 
individual diagnosis. 

The relationships of scores in these two components to scores on 
known measures of mental ability and reading comprehension are of 


TaBLE IJ].—REw1ABILITY COEFFICIENTS OF WorD KNOWLEDGE AND REASONING 
IN ReEapING ScORES 








Reliability Standard 

coefficient Mean deviation 
Word Knowledge score...................... .95 46.3 13.9 
Reasoning in Reading score.................. .78 24.1 4.8 














considerable interest. A beginning has already been made in deter- 
mining these relationships, although much remains to be done. 


TaBLE III].—INTERCORRELATIONS, AFTER CORRECTION FOR ATTENUATION, OF 
Worp KNOWLEDGE, REASONING IN READING, AND Q aNnp L Scores OF THE 
AMERICAN CouNcIL PsycHOLOGICAL EXAMINATION (N = 121) 














Reasoning 
Test in Reading Q score | L score 
a a oe cng eedeuncanees es .08 41 . 89 
Reasoning in Reading........................+. rae 13 .07 
I Tania, sh Ghd nciils glues dweipet 6600s 0 eo ei wr =“ .58 








Reference to Table III indicates that the corrected correlations 
between the Word Knowledge component and the Q and L scores of 
the American Council Psychological Examination are .41 and .89, 
respectively. As would be expected, the L score shows a markedly 
higher correlation with the Word Knowledge component than does 
the Q score. Since the corrected correlation between the Q and L 
scores is .58, it is not surprising to find that the Q score has a substantial 
positive correlation with the Word Knowledge component. The 
corrected correlations between the Reasoning in Reading component 
and the Q and L scores are .13 and .07, respectively, neither of these 
coefficients being significantly higher than zero. These interesting 
relationships show that the function measured by the Reasoning 
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in Reading component is unrelated to the predominantly nonverbal 
elements represented by the Q score. This finding is in agreement 
with the evidence indicating that nonverbal tests measure rather 
specific mental skills and show little correlation even among them- 
selves. As a matter of fact, the component that has been called 
Reasoning in Reading probably measures the ability to manipulate 
verbal concepts and is unrelated to nonverbal abilities. It is of 
interest to note in passing that the corrected correlation between the 
Word Knowledge and Reasoning in Reading scores in this group of 
students was found to be .08. 

In another group of eighty-two students, the correlations between 
scores in the Word Knowledge and Reasoning in Reading components 
and the total score of the Nelson-Denny Reading Test were .86 and 
—.23, respectively, after correction for attenuation. The negative 
coefficient between Reasoning in Reading and the total score of the 
Nelson-Denny Reading test is slightly more than twice the size of its 
standard error. These data, combined with our knowledge of the 
fact that students very rarely have time to attempt all of the items in 
the test, suggest that the Nelson-Denny Reading Test is almost 
exclusively a measure of word knowledge and speed of reading. It 
apparently does not require the exercise of the mental skills found 
to constitute the second largest component measured by the Codp- 
erative Reading Comprehension Tests—skills that are of primary 
importance in making inferences, drawing conclusions, and weaving 
together the ideas in a passage to get the meaning. 

If we grant that the Word Knowledge and Reasoning in Reading 
components of the Codperative Reading Comprehension Tests measure 
fundamental abilities in reading comprehension, it becomes important 
to ascertain the extent to which other reading tests now commonly used 
in secondary schools and colleges measure these abilities. It is prob- 
able that most reading tests will be found to be largely measures of 
word knowledge. 

In clinical use, the two independent scores provided by the new 
profile chart shown in Fig. 1 may prove of value in determining whether 
& given individual would benefit most from learning exercises designed 
to increase his vocabulary level or his ability to relate the elements 
of a reading passage and make inferences therefrom. Work is now in 
progress to refine the measurement of the two independent components 
of reading ability described in this article and to make possible the 
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measurement of several other important elements of comprehension 
in reading that have already been identified. Meantime, it is hoped 
that the profile chart now available will be of use as a practical tool 
for the measurement of reading ability among high-school and college 
students and will act as a stimulus to further research in the funda- 
mental processes of comprehension in reading. 





THE RELATIONSHIP OF CERTAIN FUNCTIONS 
TO EYE-MOVEMENT HABITS 


R. G. SIMPSON 
Carnegie Institute of Technology 


There has been much written on the subject of training the eyes 
to read during the past quarter century. While the information has 
not been too convincing, it has been helpful in arousing interes+ in 
the controversial subject of improving reading ability by improving 
the eye-movement habits. 

This study may be regarded as just another one of the numerous 
studies dealing with eye-movement habits in reading and related 
information. Its purpose is, in brief, to show the relationship, if 
any, between certain measures of eye movements of good and poor 
students. 

The studies conducted by the Reading Clinic of the University 
of Iowa have been most resourceful in supplying data on the subject 
of eye movements in reading. A few of these studies, notably the 
study made by Walker* a number of years ago as well as a later study 
made by Anderson, ft have been most valuable in revealing the relation- 
ship of eye-movement habits of good and poor readers. These studies 
with others of the Iowa Clinic are typical of many well-controlled 
investigations in the field of eye-movement photography. 

During the past few years it has been possible to collect considerable 
information relating to eye-movement habits in our reading laboratory. 
Some of the information has been quite worth while in organizing 
our remedial reading program; and some of it has not been too impres- 
sive; yet it has not been entirely disappointing. It is possible, how- 
ever, that the information of this article will be interesting if not useful 
to teachers of reading. 

The procedure employed to obtain the sampling of cases for the 
study consisted of analyzing the placement records of a large number 
of freshman students in the engineering classes. These records 
included each student’s performance on the general scholastic aptitude 
(mental ability) test, the reading ability test, the subject-matter 





* Walker, R. Y.: The Eye Movements of Good Readers. Psychological Mono- 


graphs, Vol. xiiv, 1933, pp. 95-117. 
t Anderson, Irving H.: Studies in the Eye Movements of Good and Poor Readers. 


Psychological Monographs, Vol. xivim1, 1937, pp. 1-35. 
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achievement test, and in some instances, when the records were 
available, the spatial relations test. 

When the final analysis was completed, there were forty-seven 
freshman students, ten of whom were highest in the general rating 
scale, nine were above average, ten were average, eight were below 
average, and ten were lowest in the rating scale.* Not one of these 
students had any previous eye-movement training. The procedure 
was entirely new to all of them. As far as could be determined, none 
of the poorer students had any serious reading deficiency. 

The test data and the records collected for the study consisted 
of the average number of eye fixations per line, the average number 
of re-fixations (regressions) per line, the average duration of fixations, 
the rate of reading as measured by the eye-movement camera, the 
rate of reading as measured by the Iowa Silent Reading Test, the 
semester factor rating,f the mental ability rating, and the general 
rating which included the results of all placement tests. 

The test cards used in photographing the eye movements in reading 
were developed by the educational department of the American Optical 
Company for high-school and college levels. These cards are com- 
posed of relatively easy historical material and are not standardized. 
However, they seem to serve quite well the purpose for which they 
were intended. 

The correlations of the several measures and functions are given 
in terms of the Pearson r in the following tables. 


TaBLE I.—Tuse CORRELATIONS OF CERTAIN EYE-MOVEMENT MEASURES WITH 
THE SEMESTER Factor, MENTAL ABILITY, AND GENERAL RATING 








Semester Mental General 

factor ability rating 
Eye fixations (average per line).......... —.48 + .08}—.38 + .08}—.65 + .06 
Re-fixations (av. per line)............... —.47+ .08)—.52 + .07|\—.54 + .07 
Duration of fixations................... — .06 + .09;/—.10 + .09;—.04 + .10 

















* The general rating scale is based on the combined results of the general 
scholastic aptitude test, the reading ability test, the high school achievement test 
and the spatial relations test. 

t The semester factor is obtained by dividing the number of quality points 
(which are certain values assigned to letter marks) by the number of units in the 
student’s schedule. A unit is one hour per week per semester of the student’s 
time, whether recitation, laboratory or home preparation. 
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The correlations between the semester factor and eye fixations 
and regressions are —.48 and —.47, respectively. These correlations 
are fairly high and indicate a definite relationship between scholarship 
and the eye-movement measures. Similarly, the correlations of 
—.38 and —.52 between mental ability and eye fixations and re-fixa- 
tions are moderately high and indicate a definite relationship between 
mental ability and the eye-movement measures. The best correla- 
tions, however, are between general rating and the eye-movement 
measures. These are —.65 and —.54, respectively. It is conceivable 
that the results of the reading ability test, which were included in the 
general rating, aided materially in producing these high correlations. 
While all of these correlations reveal a definite relationship between 
the several mental functions and eye-movement measures, they are 
not, except possibly for those of general rating, too impressive. One 
may infer that there are other factors which also aid in controlling 
eye-movement habits in reading. 

The correlations between the semester factor and the average 
duration of fixations are low and unreliable. Likewise, the correla- 
tions of mental ability and general rating with the duration of eye 
fixations are low and unreliable. 


TaBLe I1.—THe CORRELATIONS OF EYE FIXATIONS AND RE-FIXATIONS WITH THE 
RaTe OF READING AS MEASURED BY THE EYE-MOVEMENT CAMERA AND THE 
RaTE OF READING AS MEASURED BY THE Iowa SILENT READING TEST 





Rate (eye-move- | Rate (lowaBilent 
ment camera) | Reading Test) 





ae gs ose eds eek dows —.80 + .04 —.4 
ar Soe, Cn ede Co tivad en eae — .69 + .05 —.3 





8 + .08 
5 + .09 








The correlation between the rate of reading as measured by the 
eye-movement camera and the average number of eye fixations per 
line is —.80. This correlation is high and negative, presumably 
because the rate of reading as measured by the eye-movement camera 
is calculated with reference to the length of the film strip used in 
photographing the eye movements. In other words, the rate of read- 
ing as determined by the eye-movement camera is in reality the sum 
of the several fixations comprising the reading graph. In contrast 
to this rather high correlation, it will be noted that the one showing 
the relationship between the rate of reading as determined by the 
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Iowa Silent Reading Test and the number of eye fixations is very 
much lower. Itis only —.48. This is a very much lower correlation 
than one might anticipate. Likewise, there is considerable difference 
in the correlations between regressions and rate of reading as measured 
by these two methods. In all probability the explanation is that the 
rate of reading is measured in an entirely different manner by the 
Iowa Silent Reading Test than it is by the eye-movement camera. 
Nevertheless, the correlation between the rate of reading as measured 
by the Iowa Silent Reading Test and the rate as measured by the eye- 
movement camera is .70, which is considerable. * 

The results of some investigations show that the rate of reading 
may be improved by reducing the number of eye fixations and regres- 
sions per line. It would, therefore, appear only natural that there 
should be some relationship between the rate of reading and these 
factors. 

TaBLeE III.—TuHE CORRELATIONS OF THE SEMESTER Factor, MENTAL ABILITY 


AND GENERAL RATING WITH THE RATE OF READING AS MEASURED BY THE 
EYE-MOVEMENT CAMERA AND THE [Iowa SILENT READING TEST 














Semester Mental General 

factor ability rating 
Rate (eye-movement camera)............| .45 + .08| .45 + .08/| .61 + .06 
Rate (Iowa Reading Test).............. .64 + .06| .73 + .04| .67 + .05 








The correlation between the semester factor and the rate of reading 
as measured by the eye-movement camera is .45. Coincidentally, 
the correlation between mental ability and the rate of reading as 
measured in the same manner is also .45. The correlations of the 
rate of reading as measured by the silent reading test with the semester 
factor and mental ability are .64 and .73, respectively. These corre- 
lations are much higher than the correlations between the same func- 
tions and rate as measured by the eye-movement camera. In all 
probability the discrepancy is due to the difference in the methods of 
measuring the rate of reading. It is interesting to note that this 
great discrepancy does not exist between the correlations of general 
rating and rate as measured by the two different methods. Of course, 
the general rating is partly based on a reading test, and this fact may 
account for the higher correlations. It appears, however, that 





* This correlation was calculated as a matter of general interest. 
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scholarship and mental ability have a much closer relationship to 
the rate of reading as measured by the paper-and-pencil method 
than they have to the rate of reading as measured by the eye-move- 
ment camera. 


COMMENTS 


The eye-movement habits of good students and those who rate 
highest in mental ability are, in general, definitely better than the 
eye-movement habits of poor students and those who rate lowest in 
mental ability. While these correlations are not too impressive, they 
do support the fact that the central processes exercise certain control 
over the eye movements in reading. However, there are doubtless 
other factors which also exercise control over the eye-movement 
habits. 

The Iowa Studies show that the greatest differentiation in eye- 
movement habits of good and poor readers is revealed as the reading 
matter on the test cards increases in difficulty. These studies also 
reveal the fact that the purpose for which the person reads has much 
to do with the character of the eye-movement habits. 

Since some of the correlations of this study are not too high, there 
may be some justification in providing eye-movement training in 
order to improve the rate of reading of relatively easy printed matter. 
Furthermore, it is entirely possible that training in eye movements is 
much more valuable for improving reading ability among children in 
the lower grades than it is among high-school and college students. 

There is a possibility that some of the correlations between the 
several measures of eye movements and scholarship and mental ability 
would have been slightly higher but probably spurious had two extreme 
groups of good and poor students been employed in the study. How- 
ever, this is a matter of opinion. Obviously, there are certain to be 
some discrepancies in the correlations due to inaccuracies of measuring 
the eye movements in reading and other items not too well controlled. 
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SUB-CORRELATION IN COURSE MARKING 


H. M. COX AND C. M. HARSH 


University of Nebraska 


When a large college course is divided into several sections, taught 
by different instructors, the problem of obtaining comparable course 
marks in the various sections is often met by the use of a common final 
examination. Although this permits comparison of the sections, 
it does not always result in comparable marks, as many instructors 
appear to think that it should. Obviously, if a course mark is derived 
from a weighted average of daily grades, quiz grades, and the final 
examination score, the level of the course marks in a given section will 
depend in part upon the instructor’s level of grading on those elements 
which are not common to all sections. Analysis of the course marks 
may show that students who make similar scores on the common final 
examination get much higher course marks in one section than in 
another. If these discrepancies between sections can be justified on 
the basis of other information, there is, of course, no general problem. 
But more usually there is probably no justification for such dis- 
crepancies, in which case the course marks can be made more compar- 
able with very little trouble, if the instructors are aware of the nature 
of the problem. 

This failure of a common final examination to render marks compar- 
able is well illustrated in a certain freshman mathematics course at the 
University of Nebraska which is taught in sections of about twenty-five 
students. The teaching staff collaborated on the final examination and 
considered it very comprehensive. Apparently it was, for although it 
counted only one-third toward the course mark the examination scores 
were highly correlated with course marks in each of the sections 
(Pearsonian coefficients .93 to .96). Yet when the six sections were 
combined there was a lower correlation (.90) between examination 
scores and course marks. Such discrepancies between correlation 
within sections and correlation in total populations can be investigated 
by analyzing the covariation into correlation within sections and 
correlation between sections. 

The upper part of Table I presents, for each section, the inter- 
correlations of quiz average (Q), final examination (Z#), and course 
mark (M). The lower part of the table presents the analysis of the 
total correlation (for combined sections) into the correlation between 
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“ 
TABLE I.—INTERCORRELATIONS BETWEEN Quiz AVERAGE (Q), FINAL 
EXAMINATION (Z), AND Course Mark (M) sy SEcTIons; 
FOR COMBINED SECTIONS; AND BETWEEN AND WITHIN 























SEcTIONS 

Section N TQR TQM TEM 

1 24 87 .93 .95 

2 25 83 .94 .96 

3 22 .85 .97 .93 

4 23 .80 .88 .93 

5 21 63 .70 .93 

6 20 92 .93 .94 

Combined sections.................. 135 71 .86 .90 
Between sections................... 6 —.17 .56 .60 
ee  ccccveweces = Ss .83 .89 .94 





Note that raw and rgy are correlations of parts with the whole. 


sections and the correlation within sections. The method of computa- 
tion employed was very similar to that presented by Snedecor (Statisti- 
cal Methods, 1937; pp. 220-221). 

Consider first the column of correlations rgg in the various sections. 
Obviously the quizzes are measuring abilities related to those measured 
by the final examination, for in each section 7g is a fairly large positive 
correlation. In other words, in each section the students who score 
highest on the quizzes tend to get higher final examination marks than 
do the students who score low on the quizzes. The weighted average of 
these coefficients is shown in the lower part of the table as the within 
sections correlation of .83. Yet there isa —.17 correlation between the 
mean quiz scores and the mean examination scores of the sections. If 
there were insignificant differences between the sections in mean quiz 
scores or mean examination scores this would not be disturbing, for one 
would then expect a negligible correlation between section means. But 
the F-test shows that both the quiz averages and the final examination 
scores vary more between sections than would be expected by chance 
once in one hundred times; (7.e., the F’s are above the one per cent limit 
of significance). The variation of section means is shown graphically 
in Fig. 1. 

If the quiz marks had been comparable in all sections one would 
expect an appreciable positive correlation between section means. 
Yet here we find that the section with the lowest mean ability as meas- 
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ured by the final examination gets the highest mean quiz score, which 
seems unreasonable if the teaching staff was correct in considering the 
final examination to be comprehensive. 

Figure 1 certainly suggests that the quizzes are marked more 
liberally in sections 3 and 4 than in other sections, thus raising the 
course marks in those sections. But Fig. 1 does not show the complete 
picture of relationships between quiz averages and examination scores. 
This is shown better in Fig. 2, in which the total correlation surface 
(rex) is analyzed by sections. The oval encloses all of the plotted data. 
A circle indicates the mean examination score and quiz average for a 
given section. Through each circle is drawn a line of best fit to show 
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the mean quiz averages of students with various examination scores. 
The length of the regression line indicates the range of examination 
scores of the middle two-thirds of the students in the section. It is 
apparent that sections 3 and 4 are out of line with the other sections. 
If their quiz averages were comparable to those of the other sections, 
the means for sections 3 and 4 should have been considerably lower, as 
indicated by the vertical dotted arrows. Then the total reg would 
have been raised from .71 to at least .83 or better, for there would be a 
significant positive correlation between sections rather than the 
observed negative correlation of —.17. 

One way of keeping section grades comparable is illustrated in a 
social science course taught in several sections of from thirty to fifty 
students. The objective (multiple choice) quizzes are the same in all 
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sections, but written quizzes are different in all sections. The com- 
prehensive final examination contains both objective and written 
questions and is the same for all sections. Before combining scores to 
obtain course marks an instructor is asked to adjust his quiz averages 
up or down so that their mean is close to the mean final examination 
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score of the students in his section. It would be even better to ask 
that the variability or, range of quiz scores be equalized, but even 
without this refinement the course marks in different sections are made 
quite comparable. Figure 3 shows the resulting picture when course 
marks are correlated with final examination scores. In this case there 
is a high correlation between sections, and the total correlation rsx for 
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all sections is the same as the average correlation within sections (.88). 
The essential point is that no section is far out of line with the others, 
so it can fairly be said that students of a given level of ability (as judged 
from the final examination) tend to receive the same range of course 
marks in all sections. 

It should be noted that this equalization does not have the effect of 
basing the course mark entirely on the final examination. It still 
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allows for the student who may be very good on quizzes but who may 
do poorly in the final examination, or vice versa. In the social science 
course we have used as an example, the final examination counted only 
one-third in determining the course mark, and the quizzes counted 
two-thirds. Thus the high correlation rgy results not from heavy 
weighting but rather from the fact that the quiz scores correlated fairly 
well with the final examination scores (.6 or better). 
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This example illustrates what may be done even when each instruc- 
tor in a course desires to maintain his individuality in the kind of 
quizzes he gives. It simply requires that he withhold exact evaluation 
of the quiz scores until, from the comprehensive final examination, he 
can determine the mean level of ability of his students as compared 
with those in other sections. 

A simpler solution would be to have all sections use comparable 
quizzes as well as a common final examination, but many instructors 
are not ready to accept so much regimentation. We suggest that the 
above-described attempt to keep section marks in line will permit 
considerable individualism while protecting an instructor from gaining 
a reputation of being “‘easier’’ or “harder” than his colleagues. 
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RELIABILITY OF MULTIPLE-CHOICE MEASURING 
INSTRUMENTS, A FUNCTION OF THE SPEARMAN- 
BROWN PROPHECY FORMULA, VI 


H. H. REMMERS AND R. M. ADKINS 
Purdue University 


In a research by Likert? it was found that, upon increasing the 
number of possible responses to each item of a multiple-choice test, 
the reliability of the test was likewise increased. It occurred to the 
senior author that, since increasing the number of responses to each 
item of a multiple-choice test might be construed as increasing the 
length of the test, a well-known formula, the Spearman-Brown 
formula, for predicting increase in reliability as the length of the test 
increases might be applicable in this case also. 

The first study in this series'! attempted to test this hypothesis 
by the use of already published reliabilities, using tests which seemed 
to meet the requirements. The results were inconclusive, serving 
only to demonstrate that careful, scientifically controlled studies must 
be initiated, not only to prove or disprove the original hypothesis, 
but also to determine the scope or limitations of its use. 

The second study® used a multiple choice vocabulary test, ranging 
in its forms from two responses per item to five responses per item. 
It was found that the reliability increased as the number of responses 
increased and that, within allowable limits, the increase in reliability 
was predictable by the Spearman-Brown prophecy formula. 

The third study® used a social study scale with the number of 
responses per item varying from two to seven. It was found that the 
reliability increased as the number of responses increased and that this 
increase in reliability could be predicted by the Spearman-Brown 
prophecy formula. A possible breakdown of the hypothesis when the 
number of responses is increased about five was indicated, however. 

The fourth study’® used an arithmetic test with the number of 
responses per item ranging from two to five and it was found that 
as the number of responses increased the reliability increased. This 
increase in reliability was in accord with values predicted by the 
Spearman-Brown prophecy formula. 

The fifth study™? used an attitude scale in which the number of 
responses per item varied from two to seven on the various forms. 
It was found that the reliability increased as the number of responses 
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increased; and, when weighted scores were used, this increase in 
reliability was predictable by the Spearman-Brown prophecy formula. 

Standardized tests were chosen as the bases for the present investi- 
gation. Algebra was selected as the subject for testing. 


PROCEDURE 


It was decided that this study should include only two-choice 
three-choice, four-choice, and five-choice items. Standardized tests*:*’ 
using this type of item and dealing, at least in part, with the first 
semester of the beginning course in algebra were reviewed. Such 
items as were suitable were taken from these tests and, with minor 
changes, were combined into a test consisting of forty-four items. 
Since the test was to be given in high schools of Indiana, “‘ A Revised 
Tentative Course of Study in Mathematics for Secondary Schools 
in Indiana,” the most recent official course of study, was chosen as a 
basis for the selection of items. 

From the original test consisting of five-choice items, it was then 
necessary to eliminate enough incorrect responses to construct three 
equivalent forms, consisting of four-choice, three-choice, and two- 
choice items. It was assumed that some set pattern might have 
been followed in arranging the incorrect responses on the standardized 
tests, so the following device was used to eliminate incorrect responses 
by chance: Five small wooden cubes were made. On each face of the 
first cube was placed a “1,” on each face of the second a “2,” etc. 
until the fifth cube had on each face a “5.” 

The first operation in eliminating the incorrect responses was to 
determine the number of the correct reponse and the cube bearing 
this number was then omitted. The four cubes that remained were 
placed in a narrow-necked, opaque container so that only one cube 
could be emitted at a time. The incorrect responses corresponding 
to the cubes as they appeared would thus be eliminated from the 
original five-choice item in making out the three equivalent forms of 
the test. For instance, if a three-choice item was desired, the four 
cubes representing the four incorrect responses would be placed in the 
container, mixed thoroughly, and the first two cubes to emerge would 
determine the incorrect responses to be eliminated in forming the 
equivalent item. 

In this manner four equivalent forms of the test were arranged. 
Form A was the original test consisting of five-choice items, Form B 
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consisted of four-choice items, Form C of three-choice items, and 
Form D of two-choice items. 

Since this test was intended to be a test of knowledge with little 
regard for speed, the time limit was set high enough that, when the 
test was administered, approximately ninety-eight per cent of the 
students succeeded in answering all the items. 

The test was administered as an achievement test to one thousand 
thirty high-school students of all grades in Greene, Owen, and Clay 
Counties, in Indiana. Seven consolidated high schools in these 
counties participated in the testing program. Test Forms A, B, C, and 
D were arranged in order and distributed by chance. 

There were returned two hundred sixty-two Form A tests, two 
hundred fifty-six Form B tests, two hundred sixty-one Form C tests, 
and two hundred fifty-one Form D tests. 

In scoring the tests, correction was made for guessing according 
to the formula 


: Wrong 
Score = Right — 7; 





Scores for the odd and even items were determined in the same manner. 

Reliabilities for each of the four forms were determined by com- 
puting the self-correlation by the split-half method. After the scores 
for the odd items and for the even items had been determined, the 
Pearson Product-Moment procedure was applied. No attempt was 
made to determine the reliability of the whole test, since, because 
the correlations were to be directly compared, the reliability of the 
whole test would be of no more value than the reliability of the half 
test. The reliability for Form A was .801, for Form B .799, for 
Form C. 711, and for Form D .668. 

The reliabilities increased as the number of responses per item 
increased. According to our hypothesis, this increase should within 
the allowable sampling error be predictable by the Spearman-Brown 
prophecy formula. The Spearman-Brown prophecy formula is 
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where r; is a reliability already determined for a first form, rz is a 
reliability to be estimated, and 


m2 number of possible responses for each item of Form 2 


iti=-—-= 





m, number of possible responses for each item of Form 1 
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Using the obtained reliabilities, predictions were made for each of the 
other forms as shown in Table I. 

According to Fisher, correlations of over .500 tend to be skewed 
and are, therefore, too small. The correlations, both obtained and 
predicted, were corrected by being transformed to ‘‘z’’ functions.‘ 
P. 215. 

Our statistical procedure was as follows: The differences between 
obtained z’s and predicted z’s were computed. Since the standard 
error of z is o, = 1/+/n — 3 and Gaitterence = W012 + 02%, the standard 
error of a difference between two z’s where n is the same reduces to 


2 
Fditference = 4/7, — 9 ‘This latter formula was used to compute the 


values in the column headed Goaitterence (See Table I). The critical ratios 
between the differences of z’s and the standard error of these differences 
were computed. 

A typical row of Table I is interpreted thus: From the five-choice 
test, Form A, with reliability of .801 the reliability for an equivalent 





TaBLE I.—CoMPARISON OF OBTAINED AND PREDICTED RELIABILITIES 
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Tobt. T pred. Zobt. OZobt. Zpred. diff., Caiff. CR 
Reliabilities as Predicted from Form A 
Form B .799 .763 | 1.096 .063 | 1.003 .093 .089 | 1.045 
Form C 711 .707 .889 .062 .881 .008 .088 .091 
Form D .668 .617 .807 .064 .720 .087 .090 .967 
Reliabilities as Predicted from Form B 
Form A .801 .832 | 1.101 .062 | 1.195 .094 .088 | 1.068 
Form C 711 .749 .889 .062 .971 .082 .088 .932 
Form D .668 .665 .807 .064 .802 .005 .090 .056 
Reliabilities as Predicted from Form C 
Form A .801 .804 | 1.101 .062 | 1.110 .009 .088 .102 
Form B .799 .766 | 1.096 .063 | 1.011 .085 .089 .955 
Form D .668 .621 .807 .064 .727 .080 .090 889 
Reliabilities as Predicted from Form D 
Form A 801 | .834| 1.101 | .062| 1.201 | .100| .088 | 1.136 
Form B .799 .801 | 1.096 .063 | 1.101 .005 .089 .056 
Form C 711 .751 .889 .062 .975 .086 .088 .977 
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form, Form B is predicted to be .763. The obtained reliability for 
Form B was .799. When the obtained and predicted reliabilities 
of Form B are changed to z functions, (values 1.096 and 1.003, respec- 
tively), the difference of z’s is found to be .093. Since the standard 
error of this difference as found from the formula above is .089, the 
critical ratio is 1.045. 

An examination of Table I shows that all the differences between 
obtained and predicted reliabilities are well within the statistically 
allowable limits. In but few cases are the critical ratios more than 
1.000. 

Thus we can conclude that, with the test used and the population 
tested, the reliability increases as the number of possible responses 
per item increases. Secondly, this reliability can be predicted, within 
allowable limits, by the Spearman-Brown prophecy formula. 


SUMMARY AND CONCLUSION 


This study was made to determine whether the change of reliability 
of multiple-choice measuring instruments with a change in the number 
of possible responses per item is predictable by the Spearman-Brown 
formula. 

It was found that, for the instrument used and the population 
tested, as the number of possible responses per item increased the 
reliability increased, and that the increase was predictable by the 
Spearman-Brown prophecy formula. 
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A RATIO FOR ESTIMATING THE RELIABILITY 
OF TEST SCORES 


JOHN M. BUTLER 
University Testing Bureau, University of Minnesota 


The counselor, in using statistical measures in his work, is primarily 
interested in knowing to what extent scores obtained are representative 
of the true scores of his counselees on the tests they have taken. The 
traditional index of the extent to which the obtained scores are repre- 
sentative of the true scores is the reliability coefficient, r,,,* no matter 
how computed. This coefficient, however, is an indirect estimate 
of the extent to which obtained scores approximate the true scores, 
for what it actually tells one is that differences in scores between 
individuals, from item to item, are consistent to the degree indicated 
on the correlation scale. In other words, the coefficient emphasizes 
the heterogeneity of the individuals tested in regard to whatever is 
being measured by the test. The complementary expression, 1 — ry, 
shows the extent to which differences in score between individuals, 
from item to item, are not consistent. Thus 1 — r,, may be considered 
to be an index of reservation which may be used in judging how closely 
the obtained scores approximate the true score. In and of itself 
this is more directly informative to the counselor than is ry. Essen- 
tially it gives the same information as the standard error of measure- 
ment, s+/1 — ry. To most people, however, the standard error 
of measurement is more meaningful than 1 — ry, because it is a meas- 
ure of absolute accuracy. That is, the standard error of measurement 
is expressed in terms of the units of measurement of the test used, 
whereas 1 — ry, is a measure of relative accuracy expressed in terms 
of the correlation scale and, furthermore, is not derived from linear 
terms. 

Baxter and Paterson! have pointed out that the magnitude of the 
standard error of measurement does not attain maximum significance 
until it is related to the variability of the norm group because two 
standard errors of measurement of the same size have the same mean- 
ing only when the variability of the groups from which they were 
derived is equal. In one case it may mean a large index of reservation, 
t.e., when the variability of the group is small; in the other, it may 
mean a small index of reservation, i.e., when the variability of the 
group is large. They have, therefore, approached the problem of the 


* The term “reliability” in this paper refers only to the self-correlation of a 
test and not to test-retest reliability. 
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significance of the size of error variability by relating the standard 
error of measurement to the total variability of the norm group. 








They hold that their ration, s J ; - "tt, is useful because relating the 


variability of individuals to the total variation gives concrete, unam- 
biguous evidence as to the ability of the test to discriminate between 
individuals. This ratio, of course, reduces to +/1 — ry, the standard 
error of measurement of a standard score, but by putting it in the form, 


| . —, one can readily see that +~/1 — ry, represents the per- 


centage of the total variation which can be ascribed to errors of 
measurement. 

This ratio, although informative, has some drawbacks. In the 
first place, in its present form it is not amenable to statistical treat- 
ment. In the second place, one is probably more interested in finding 
the relation of the variability which can be ascribed to the functions 
being measured by the test (variability between individuals) to the 
variability of the individuals (within individuals variability), than in 
the relation between individual variability to total variability, which 
contains terms irrelevant to such a comparison. However, a con- 
sideration of the nature of the reliability coefficient leads to a direct 
and unambiguous solution of these problems. It has been recognized 
that the reliability coefficient may be considered to be the estimated 
percentage of the obtained variance which is true variance. Since 
this is the case, it is obvious that 1 — ry, represents error or within- 
individuals variance; it is that variance which may be ascribed to 
individuals rather than to what is being measured by the test. Since 
rz and 1 — ry, represent variances, it is seen that rx, + 1 — ry repre- 
sents the obtained variance and that +/r, + 1 — ry is the standard 
deviation. Then the Baxter-Paterson ratio may be shown to include 
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the following terms Wott. 7, This expression clearly 




















shows that an irrelevant term, 1 — rz, appears in the comparison. 
True, it cancels out, but cancellation does not change the conceptual 
basis of the ratio and thus, in comparing the numerator with the 
denominator, the effect of 1 — rz enters into any judgement made. 
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One way to avoid this feature of the ratio would be to use 
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Fisher? has developed a method for testing whether one variance 
derived from n, degrees of freedom is greater than a second variance 
derived from nz degrees of freedom, and Snedecor* has presented a 
convenient adaptation of Fisher’s method for which the values at 
the 5 per cent and the 1 per cent levels of significance have been 
tabled. Snedecor’s expression, which he calls F, is the variance 
ratio. F = larger variance/smaller variance. The tables of F are 
entered with the degrees of freedom associated with each variance. 
Vs 


es when 





Since rz and 1 — ry, are variances, it is obvious that F = 


rt, .0O, and F = Fran 





when rz .50. The number of degrees of 


Te 

freedom associated with 1 — ry is (NV — 1) (n — 1) where N is the 
number of individuals tested and n is the number of items in the test: 
the number of degrees of freedom associated with ry, is N — 1. 

Being variances and, therefore, not linear terms, r, and 1 — ry 
are hard to interpret in terms of test practice and their F ratio is 
equally hard to interpret. However, if the ratio of their square roots 
is found one has the ratio of two standard deviations. For instance, 
if N is 21, n is 51, and ry is .75, then — is a or 3. Entering 
the F tables with 20 (N — 1) and 1000 (nm — 1)(N — 1) degrees of 
freedom one finds that F at the 1 per cent level of significance has a 
value of 1.52. Thus we may safely conclude that the differences 
between the two variances is not a chance difference. Now if the 

Tt 
71 — fz 
standard deviations must be significant also. The ratio, V/ris/ 
V1 — ry, has a value of 1.72 and this value may be interpreted as 
follows: The standard deviation of the group tested (square root of 
between-iadividuals variance) is 1.72 times as great as the standard 
deviation of the individuals (square root of within-individuals vari- 
ance). Since the value, 1.72, has a probability of chance occurrence 
of less than .01, we would expect that in estimating the true score of 
individuals from the obtained scores we would make an error as great 
as, or greater than, one standard deviation of the group less than once 
in one hundred times. 

It must be emphasized that whenever the reliability coefficient 
used is not the best estimate of the population reliability, the ratio 
advocated shares in the deficiences of the reliability coefficient and 





ratio of the variances » is significant, then the ratio of their two 
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is in error in proportion as the reliability coefficient is erroneous, 
When the reliability has been calculated by the Hoyt method? or by 
formula 20 of the Kuder-Richardson paper® an exact estimate may 
be made. 

The calculation of a unique estimate such as the one given above, 
although useful, does not give all the information which can be used. 
The concept of fiducial limits makes the estimate we have derived 
from our sample much more meaningful. To set up the limits within 
which we can say with a given degree of confidence that our estimate 
falls will usually give one more insight into interpretation of test 
scores. Suppose, for instance, we want to be able to say that the 
fiducial probability is .98 that the population value, of which 1.72 is 
an estimate, does not fall above or below certain values. To find the 











upper limit we use the expression J (F) nt =te where F = 7 Mt 


eee 


and Fo; is the one per cent level of the variance ratio for 20 and 1000 
degrees of freedom; to find the lower limit we use +/F/2F 01) — \. 
Similarly for the fiducial probability of .90 we use the expression 


(F)(Fios) — 1 d aE me 
J 9 and oF 5 for the upper and lower limits, 


respectively. The use of interval estimation with unique values gives 
one more insight into the reliability of test scores and in most cases 
makes for more caution in their use. 

The writer believes that the ratio advocated has several advantages 
among which may be included the following: 

(1) The units are of equal length throughout the range of varia- 
tion. An increase in reliability from .93 to .94 as indicated by the 
reliability coefficient is much greater than an increase from .50 to 
.51, whereas an increase in the ratio of from 1. to 2. denotes an increase 
of reliability exactly equal in amount to the increase from 2. to 3. 

(2) The ratio calls attention to the relation of the variability of 4 
test score to the total variability, a characteristic which should be 
especially useful in vocational and educational counseling where the 
test score of the individual is all-important. 

(3) Because of the equality in units it is easier to evaluate the 
effect of differences of range of talent on reliability with the ratio 
than it is with the reliability coefficient. 


The expression +/rz/+/1 — rz corresponds to a statistic called 
the “‘sensitivity of a mental test” developed by Jackson‘ using the 

















* Furnished the writer by Professor Palmer O. Johnson. 
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methods of analysis associated with the names of Neyman and Pearson. 
This statistic is defined as the ratio s./s; where s. is the standard 
deviation between individuals and s; is the standard deviation of the 
individual. He has shown that this ratio, y (gamma), is a function 
of p (rho), the parameter value of r. This function may be written in 


2 
cx. which in turn may be written 7 = i f 4 
Since the value of p is usually unknown, r must be substituted for it 
in the equation. Thus g = ~/r/+/1 — r where g represents 7 as 
estimated from the reliability coefficient. Jackson evaluates this 
ratio by entering the normal probability tables with g as z/o. There- 
fore, if g is, say 2.57, we may expect that in estimating the true scores 
of individuals from the obtained scores we would make an error as 
great as one standard deviation of the true scores but once in one 
hundred times. G, however, will be exactly equal to the ratio, ~/r,/ 
1/1 — ry, only when the reliability coefficient has been calculated 
by the Hoyt method or by formula 20 of the Kuder-Richardson paper. 





the form p = 


SUMMARY 


A ratio has been presented which gives essentially the same infor- 
mation as the Baxter-Paterson ratio but which is believed to be more 
accurate and more amenable to statistical treatment. The ratio is 
superior to the reliability coefficient when one is interested in inter- 
preting individual test scores and when one compares samples with 
differing ranges of talent. It has the advantage of equality of units 
throughout the range of variation. Under certain conditions the 
ratio assumes a value which is identical with that statistic which has 
been called the “‘sensitivity of a mental test.” 
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BOOK REVIEWS 


A. B. Firr. Seasonal Influence on Growth, Function and Inheritance. 
Education Research Series, No. 17. Auckland, N. Z.: New 
Zealand Council for Educational Research, 1941, pp. 182. 


Fitt begins his monograph with this sentence: ‘‘For at least one 
hundred and eighty years evidence has been accumulating in many 
countries and in the fields of several sciences on the possibility of a 
connection between the course of the seasons and changes in growth, 
development, and other human functions.” Fitt describes much of 
this evidence, adds to it his own extensive experiments, and proposes 
an interesting hypothesis to explain these seasonal fluctuations. He 
deals with both the physical and mental aspects of seasonal fluctua- 
tion. He presents his own data for monthly weight and height 
increases, for monthly scores on memory and cancellation tests, and 
also 22,356 IQ’s for school children from age ten to thirteen inclusive. 
While most workers in this field have tried to find some relationship 
between the season of birth and various physical and mental factors, 
Fitt maintains that the important point is the season of conception 
rather than the season of birth. ‘“‘The human organism,” he says, 
“suffers relatively little stress in its growth and function during the 
summer.” Hence we find greater gains in weight and in learning tests 
during the Autumn-Winter half of the year, and, furthermore, children 
conceived during this period show on the average slightly higher IQ’s. 

Fitt’s contribution in this monograph is his valiant attempt to 
synthesize all the various types of data dealing with seasonal fluctua- 
tions and to present his hypothesis of an annual internal organic 
rhythm, closely associated with endocrine balance, with the possibility 
that external factors, particularly solar light, may have some part in 
influencing this basic organic rhythm. The bringing together of all 
these varied data is in itself a valuable contribution, whatever one may 
think of Fitt’s hypothesis, and the book will certainly stimulate man’s 
curiosity to delve deeper into this field. Rupo.F PINTNER. 

Teachers College, Columbia University. 


SaMuEL P. Hayes. Contributions to a Psychology of Blindness. 
New York: American Foundation for the Blind, 1941, pp. 296. 
Part I of this book is organized as a “psychology of blindness” 
and deals with sensory compensation, facial vision, memory, intelli- 
gence, and mental status. The historical orientation of this discussion 


involves the listing of many false notions later disproved by controlled 
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observation. Some of this material might have been omitted or 
abbreviated to good advantage. Facts that are of general interest 
include the following: (1) Contrary to popular belief, the blind are not 
superior to the seeing in fineness of sensory discrimination. (2) 
Although ample evidence is presented to show that there is no “facial 
vision,” but that orientation to obstacles is in terms of auditory cues, 
the author is still inclined to believe there may be an “ obstacle sense.”’ 
(3) Data indicate no general compensatory superiority in the memory 
of blind children. (4) There is an apparent inverse relation between 
degree of vision and mentality. Explanations are suggested. In 
discussing inheritance of ability, the author uncritically cites the 
unsound and discredited views of Raymond Pearl. In fact too much 
of the author’s interpretation is based upon a psychology which has 
been discarded for many years. 

The material in Part II of this treatise is of great practical impor- 
tance. It deals with mental and educational measurement in schools 
for the blind. Available measuring devices are listed, detailed 
descriptions of procedures are given, and tentative norms recorded 
with interpretations. Advantages of measurement and a plea for 
expansion of the program are given. A few of the interesting findings 
may be cited: (1) Blind pupils are very slow readers. (2) In 
general the blind are decidedly inferior to seeing pupils in spelling 
(not true in all schools), somewhat inferior in arithmetic, but equal or 
slightly superior in composition writing ability. (3) Among the blind 
there is a definite annually increasing deficiency from grade to grade 
in vocabulary knowledge. (4) “‘In general, standardized tests of 
school achievement show just about the degree of inferiority to the 
seeing which one would expect from their grade retardation.” 

Examination of the listed references indicates that there has been 
little research on the psychology of the blind during the past twenty 
years except in the field of educational measurement. The author’s 
adequate treatment of the latter makes this a ‘‘ must’”’ book for teachers 
and psychologists dealing with the blind. Mies A. TINKER. 

University of Minnesota. 


JoHN GoopricH Warxins. Objective Measurement of Instrumental 
Performance. New York: Teachers College, Columbia Uni- 


versity, 1942, pp. 88. 


There has been a definite need for testing instrumental performance 
in music. Most performance tests (rarely using complete melodies) 
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have been confined heretofore chiefly to measuring sight singing. 
Watkins’s very excellent approach to his subject resulted in a series of 
tests based on a survey and analysis of many widely known cornet 
methods. The various symbols of music notation and instrumental 
difficulties were introduced in the order learned as deduced from 
preliminary study. 

Sixty-eight melodies were administered to a group of one hundred 
five cornet students representing various levels of ability. Two 
equivalent sets of fourteen each from these tests were selected as 
final Forms A and B, having been found to be equivalent in difficulty 
throughout their entire range. Based on the scores made by the 
one hundred five cases in the preliminary testing, Forms A and B 
correlated .982 with each other. The internal consistency of both 
forms of the test was high, and dispersions of the scores on the respec- 
tive exercises and on the entire test were approximately the same for 
both Forms A and B. Watkins concludes, therefore, that the test was 
an objective, reliable, and valid measure of cornet performance. 

The scoring unit was each measure instead of each note. One 
defect with this method (although otherwise preferable) is that if 
the subject is unnerved by an error at the end of a measure the play- 
over into the next measure might result in two errors counted, whereas 
the same errors embodied within the bar lines would result in only one 
error counted. A list of typical errors should have been included 
and analyzed. Having had the advantage of Wheelwright’s fine 
study of Percepttbility and Spacing of Music Symbols it seems a pity 
that the manuscripts used (they are reproduced in the Appendix) 
did not more clearly differentiate the ‘“‘naturals’”’ from the “sharps.” 
Also, the bars for 8th notes and 16th notes might have been thicker, 
and some of the notes better spaced and combined. 

In the main, however, the work is very thorough, and is a genuine 
contribution to education in the field of music research. 

Louis CHESLOCK. 


Peabody Conservatory of Music, Baltimore. 


EpWARD B. GREENE. Measurement of Human Behavior. New York: 
Odyssey Press, 1941, pp. 777. 


The purpose of this book is to provide a text for students in psy- 
chology, education and related fields. It is intended not only for 
those who may be called on to give tests but also for those who would 
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find it necessary to evaluate or interpret the results of psychological 
tests. 

In his attempt to acquaint the student with a wide variety of 
psychological and educational tests and methods the author is very 
successful. ‘The is comprehensive and thorough, and the style is clear 
and understanding. 

The volume is arranged in three parts. Part I, ‘Basic Con- 
siderations,’ includes chapters titled: Introduction; Varieties of 
Appraisals; The Interpretaion of Scores; Measures of Relationship; 
Measuring Instruments; Construction and Evaluation of Test Items; 
and Factorial Analysis. The treatment of these topics is clear, with 
emphasis on statistical methods. Discussion of validity, evaluation 
and interpretation of scores are concerned largely with statistical 
relationships, and psychological questions such as: ‘‘ What does this 
technique really test?’’ ‘‘ What does this score mean in terms of the 
individual it attempts to describe?,”’ ‘‘What can you do with a score 
when you have it?,” are relatively neglected. 

Part II, ‘‘Instruments and Results,” is abundant with tables, 
illustrations, and sample items from a suprisingly large number of 
standardized tests. Topics covered are: Tests of Childhood; Meas- 
ures of Achievement; Binet-type Scales; Group Intelligence Tests; 
Performance, Mechanical and Motor Tests; Measurement in Fine 
Arts; Academic and Vocational Interests; Appraisal of Attitudes; 
and Modes of Adjustment. The chapter on the fine arts is particu- 
larly comprehensive. Tests are described and to some extent evalu- 
ated. Experimental results in each field are summarized with special 
reference to factorial analysis. 

From the point of view of one interested in the clinical application 
of these tests one could wish that fewer specific tests were presented 
and these examined more critically. Although reliability and stand- 
ardization are often discussed, the lack of information concerning the 
validity of many of the tests is not emphasized. Instead of a healthy 
scepticism and understanding of the limitations and misuses of 
these tests the elementary student can get an impression of a highly 
developed field where one can measure almost anything with great 
accuracy. Individual factors influencing test results are largely 
neglected. 

Part III, ‘‘ Persistent Problems,’’ provides a stimulating discussion 
of such major problems as: Effects of Practice on Test Scores; Meas- 
urement of Growth and Senescence; Standard Deviation or Absolute 
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Scaling; The Evaluation of Judgments; and Measurement of Native 
Differences. A twenty-eight page combined author index and 
bibliography is appended. 

This book provides a clearly-written and comprehensive introduc- 
tory text for students in courses dealing with mental measurements. 
It provides the student with knowledge of a wide variety of tests 
and methods, statistical procedures in the construction and evaluation 
of tests, and experimental results in the various fields where such 
techniques have been applied. The failure to stress individual 
factors limiting the use of many test scores and the absence of informa- 
tion concerning the validity of most of the tests in practice can be 
compensated for by the instructor. JULIAN B. Rorrer. 

Norwich State Hospital. 





